1.8 


AD  A099676 


Final  Report; 

COMPUTERIZED  ADAPTIVE 
ABILITY  TESTING 


David  J.  Weiss 


April  1981 


DTIC 

_^ECTE 
K  JUN  3  1981 

A 


Computerized  Adaptive  Testing  Laboratory 
Psychometric  Methods  Program 
Department  of  Psychology 
University  of  Minnesota 
Minneapolis  MN  55455 


Final  Report  of  Project  NR150-431,  N00014-79-C-0324 
Supported  by  funds  from  the 

Office  of  Naval  Research,  the  Air  Force  Hunan  Resources 
Laboratory,  Air  Force  Office  of  Scientific  Research,  Axsiy 
Research  Institute,  and  nonltored  by  the  Office  of  Naval  Research 
David  J.  Weiss,  Principal  Investigator 


Approved  for  public  release;  distribution  unllalted. 
Reproduction  In  whole  or  In  part  Is  permitted  for 
any  purpose  of  the  United  States  Gwemment. 


Sit)  0  3  ii2 


0 


Unclassified _ 

secuftlTV  CUASSIFICATIoh  of  This  page  Dmtm  tnlmnd) _ 

REPORT  DOCUMENTATION  PAGE  befoI4''co5?le™S'Vorm 

I.  report  NUMBER  12.  COVT  ACCESSION  NO.  3.  RECIPIENT’S  CATALOG  NUMBER 


REPORT  NUMBER 


|4.  title  fmd  Subtlllm) 


Final  Report :  . 

f^TTComputerized  Adaptive  Ability  Testing^) 


8.  TYPE  QE  ■BW«>T-a4»ERIOD  COVEREO 

Final  ^ep«irt« 

1  AprMBM79  30  Junaek930^ 


17.  AUTHORC*; 


I  a.  CONTRACT  OR  GRANT  NUMBERf*; 


i  Y  (  David  J. /Weiss 

k  '  I 

t.  performing  organizati^  name  and  address 

Department  of  Psychology 
University  of  Minnesota 

Minneapolis r  MN _ 55455 _ 

II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research 
Arlington,  VA  22217 


N(?0014-79-C-0324 


I  ^ 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  •  WORK  UNIT  NU|HinS.  .< 

PE^6115||L^:  ' 

'mTRRO4ZB04S01  / 


j  I  T  Apra— 9^. 


IS.  NUMBER  OF  PAGM  . 

_ y _ 


U.  MOf^TORING  agency  NAmE  ft  kOOHESSfU  tmimt0nt  tfom  ContrctUng  0Mlc9)  15.  SECURITY  CLASS>rorml« 

Unclasslfed 


ISa.  DCCLASSIFICATION/DOWNGRADING 
SCHEDULE 


I  ie.  distribution  statement  fol  fMa  RapofO 


Approved  for  public  release;  distribution  unlimited.  Reproduction  in  whole 
or  in  part  is  permitted  for  any  purpose  of  the  United  States  Government. 


I  17.  DISTRIBUTION  STATEMENT  (ot  thm  mbmirmct  anfaratf  In  Stock  90,  II  dllloront  from  Report) 


1$.  SUPPLEMENTARY  NOTES 

This  research  was  supported  by  funds  from  the  Office  of  Naval  Research,  the 
Air  Force  Human  Resources  Laboratory,  Air  Force  Office  of  Scientific  Research, 
and  the  Army  Research  Institute,  and  monitored  by  the  Office  of  Naval  Research. 


I  It.  KEY  WORDS  fConflnua  on  fovorco  cl^o  II  noeoccorf  and  IPontltr  hy  Mock  numbor) 


Testing 

Ability  Testing 


Sequential  Testing 
Branched  Testing 


Computerized  Testing  Individualized  Testing 


Adaptive  Testing 


Tailored  Testing 


Programmed  Testing 
Response-Contingent  Testing 
Automated  Testing 
Item  Characteristic  Curve  Theoi 


Item  Response  Theory  Latent-Trait  Test  Theory  Person  Fit _ 

>^e  objectives  and  approach  of  this  15-month  research  program  are  described. 
These  objectives  included  (1)  evaluation  of  the  performance  of  adaptive  testing 
strategies  under  conditions  that  more  reasonably  represent  the  conditions 
under  which  the  strategies  might  be  applied  in  live  testing,  including  effects 
of  errors  in  item  parameters  on  the  performance  of  adaptive  testing  strategies 
and  comparisons  of  adaptive  testing  strategies  in  live  testing;  (2)  evaluation 
of  the  utility  for  adaptive  testing  of  a  number  of  test  item  formats  and 
responsemodes^^ha^jj^h^^^used^^^eglagOTent^^o^^hejnul^igle^jljolg^^Jgmj 

'  S/N  0102.LF4)14^401  _ ^  ^ 


^111101 


•KCURlTV  CLAUIFICRTION  OF  TMIt  RAOt  fWMii  OMa  »«MrwD 


Unclasslfed 

SICUMTV  CkAMiriCATION  OP  THIS  OAOK  (Okai  Bm*  Smlar»4) 


/C3)  investigation  of  the  utility  of  a  number  of  person-fit  indices  designed  to 
identify  lack  of  fit  of  individuals  to  item  response  theory  models;  and  (4) 
investigation  of  the  potential  of  several  cognitive  information-processing 
types  of  tasks  for  computerized  adaptive  administration.  Research  approaches 
and  preliminary  results  are  summarized  for  each  objective.  Additional 
research  plans  currently  being  Implemented  under  Project  NR  150-433  with 
which  this  project  was  combined,  are  described. 


Unclassified 


SSCUWTV  CLAMIPICATION  OP  THIS  BAOIfSkMi  OaM  SMMtmD 


CONTENTS 


Introduction . 1 

Adaptive  Testing  Strategies .  I 

Objective . 1 

Effects  of  Errors  In  Item  Parameter  Estimates  on  Adaptive  Testing 

Strategies .  2 

Approach .  2 

Results .  2 

Additional  Research  In  Progress .  6 

Live-Testing  Comparison  of  Adaptive  Testing  Strategies .  6 

Approach . 6 

Results .  7 

Additional  Research  In  Progress.... .  8 

Future  Research  Plans .  8 

Item  Formats  and  Response  Modes .  9 

Objective .  9 

Approach . 9 

Results . . . 11 

Additional  Research  In  Progress . 13 

Fit  of  Individuals  to  Item  Response  Theory  Models .  13 

Objective . 14 

Approach . 14 

Additional  Research  Flans .  15 

New  Type  of  Ability  Tests .  16 

Objective . . . . .  16 

Approach .  17 

Memory  for  Patterns . 17 

Digit  Span. . .  19 

Analysis  and  Future  Plans .  20 


<  21 

4^  : 


References . 


FINAL  report: 

COMPUTERIZED  ADAPTIVE  ABILITY  TESTING 


This  research  program  was  designed  to  study  four  areas  relevant  to  adaptive 
ability  testing: 

1.  Evaluation  of  adaptive  testing  branching  strategies. 

2.  Use  of  different  Item  formats  and  response  modes. 

3.  Fit  of  Individuals  to  Item  response  theory  models. 

4.  New  types  of  ability  tests  designed  specifically  for 
computerized  adaptive  administration. 

Research  In  pursuance  of  these  objectives,  originally  scheduled  for  a 
three-year  period,  began  on  April  1,  1979,  and  continued  through  June  30,  1980, 
at  which  time  the  research  objectives  of  this  project  were  combined  %rlth  those 
of  Project  NR  150-433,  "Computerized  Adaptive  Achievement  Testing.” 

This  report  summarizes  the  progress  made  during  this  15-month  period.  No 
technical  reports  were  completed  during  this  period;  technical  reports  begun  in 
this  project  will  be  completed  under  Project  NR  150-433.  For  each  of  the  four 
objectives  listed  above,  this  report  (1)  describes  the  objective,  (2)  details 
the  approaches  used  to  study  the  objective,  (3)  summarizes  results  that  were 
available  at  the  completion  of  the  reporting  period,  and  (4)  describes  tentative 
plans  for  further  research  on  the  objective  to  be  continued  In  Project  NR 
150-433. 


Adaptive  Testing  Strategies 

Previous  simulation  studies  using  adaptive  testing  have  used  relatively 
unrealistic  Item  pools  (e.g.,  Gorman,  1980;  McBride,  1976;  Reckase,  1976;  Urry, 
1970;  Urry,  1971;  Vale,  1975).  These  Item  pools  have  been  unrealistic  because 
they  assumed  that  the  Item  parameters  describing  the  Items  In  the  pool  were  com¬ 
pletely  error  free,  as  well  as  assuming  Item  difficulty  and  distribution  charac¬ 
teristics  that  did  not  reflect  those  of  real  ability  tests.  Previous  studies 
have  also  been  unrealistic  In  that  they  have  assumed  that  the  responses  of  the 
hypothetical  testees  to  these  Items  have  conformed  precisely  to  the  one-dimen¬ 
sional  latent  trait  model.  When  the  results  of  previous  simulation  studies  are 
extrapolated  to  real  Item  pools  constructed  from  real  item  parameters,  they  may 
not  generalize,  because  real  Item  pools  are  constructed  from  Item  parameters 
that  Include  estimation  error  and  may  deviate  substantially  from  unldlmenslonal- 
Ity. 

Objective 

The  objectives  of  this  research  program  were  to  evaluate  the  performance  of 
adaptive  testing  strategies  under  conditions  that  more  reasonably  represent  the 
conditions  under  which  these  strategies  might  occur  in  live-testing  applications 
and  to  compare  findings  from  selected  simulation  studies  to  those  obtained  In 
live  testing.  Research  during  the  reporting  period  was  concerned  with  (1)  ef- 


J 


-  2  - 


_  . 


fects  of  errors  in  Item  parameters  on  the  performance  of  adaptive  testing  strat¬ 
egies  and  (2)  live-testing  comparisons  of  adaptive  testing  strategies. 

Effects  of  Errors  In  Item  Parameter  Estimates 
on  Adaptive  Testing  Strategies 

Approach .  This  objective  was  pursued  by  means  of  monte  carlo  simulation 
studies  that  built  upon  empirical  Information  regarding  the  nature  and  extent  of 
errors  In  Item  parameter  estimates  due  to  the  numbers  of  testees  and  Items  on 
which  the  Item  response  theory  (IRT)  Item  parameters  %rere  estimated.  Data  on 
the  kinds  and  degrees  of  error  associated  with  IRT  Item  parameterization  tech¬ 
niques  by  different  Item  parameterization  methods  were  modeled  In  the  monte 
carlo  simulations.  The  kinds  and  degrees  of  errors  observed  In  real  data  Item 
parameterization  were  translated  Into  the  monte  carlo  simulation  model  and 
served  as  Independent  variables  In  a  series  of  studies  systematically  varying 
the  magnitude  and  kind  of  Item  parameterization  estimation  error  for  the  diffi¬ 
culty,  discrimination,  and  "guessing"  parameters  separately  and  In  combination. 
Dependent  variables  In  these  studies  were  test  Information,  bias,  correlation  of 
ability  estimates  with  true  ability,  and  other  characteristics  of  the  ability 
estimates  derived  from  the  application  of  selected  adaptive  testing  strategies; 
two  conventional  tests  were  also  Included  In  the  study  for  comparison  purposes. 
The  studies  were  also  designed  to  use  an  Item  pool  that  realistically  reflected 
the  composition  of  real  Item  pools  used  In  actual  ability  tests,  in  terms  of  the 
distributions  of  the  IRT  Item  parameter  estimates. 

Figure  1  summarizes  the  design  of  this  study.  Using  a  three-parameter  IRT 
model  and  an  Item  pool  designed  to  reflect  an  adaptive  testing  Item  pool  that 
had  been  used  In  a  live-testing  study,  monte  carlo  data  were  generated  for  100 
slmulees  at  each  of  17  levels  of  ability,  ranging  from  6  -  -3.2  to  6  >  +3.2. 
Based  on  data  available  In  this  IRT  Item  parameterization  literature,  varying 
degrees  of  error  were  added  to  the  parameter  estimates  for  Item  discrimination 
(£),  difficulty  (^),  and  "guessing”  (£).  Table  1  shows  the  item  parameter  sets 
used  In  this  study:  Set  1  was  the  baseline  comparison  data  set  In  which  there 
was  no  error  In  the  Item  parameter  estimates;  In  Sets  11,  12,  and  13  varying 
amounts  of  error  were  added  to  the  £  parameter;  In  Sets  21,  22,  23,  and  24  error 
was  added  to  the  ^  parameter;  Sets  31  and  32  added  errors  to  the  £  parameter; 
and  In  Sets  41  and  42  errors  occurred  in  all  three  parameters  simultaneously. 

Using  the  error-laden  item  parameter  sets,  three  types  of  adaptive  tests 
(stratified  adaptive,  or  stradaptlve;  maximum  information;  and  Bayesian)  were 
administered  to  each  of  the  1,700  slmulees.  All  tests  were  scored  by  maximum 
likelihood  at  test  lengths  of  5  to  30  items,  in  Incren^nts  of  5  Items.  In  addi¬ 
tion,  both  peaked  and  rectangular  conventional  tests  were  constructed  using 
classical  test  construction  procedures;  and  these  tests,  along  with  the  adaptive 
tests  using  the  error-free  Item  pool,  were  also  administered  to  the  1,700  slmu¬ 
lees.  Testing  strategies  were  compared  in  terms  of  fidelity  (the  correlation  of 
true  and  estimated  6  levels),  observed  and  theoretical  information,  efficiency. 
Inaccuracy,  bias,  and  root  mean  square  error  (RMSE)  for  the  6  estimates. 

Results.  Table  2  presents  a  selection  of  the  results  for  four  of  the  de¬ 
pendent  variables.  The  fidelity  measure  was  computed  on  a  normally  distributed 
sample  of  300  slmulees;  data  for  the  other  criterion  measures  were  averaged 
across  the  1,700  slmulees.  As  Table  2  shows,  with  the  exception  of  Item  Set  42, 


Figure  1 

Design  of  Monte  Carlo  Simulation  Study  of  Effects  of 
Errors  in  Item  Parameter  Estimates  on  Adaptive  Testing  Strategies 


-  4  - 


Table  1 

Error  Siaulated  in  Item  Parameter  Estimate  Sets 


Item 

Set 

Description 

Specified  RMSE 
a  b  c 

Obtained 
a  £ 

RMSE 

c 

Obtained  r 
£  £ 

(£iE) 

£ 

1 

Error-Free 

Item  Set 

.00 

.00 

o 

o 

• 

.00 

.00 

.00 

1.00 

1.00 

1.00 

11 

Small  Error 

In  a 

.20 

o 

o 

« 

.00 

.22 

.00 

.00 

.75 

1.00 

1.00 

12 

Moderate  Error 

In  a 

.40 

.00 

.00 

.39 

.00 

.00 

.56 

1.00 

1.00 

13 

Large  Error 

In  a 

.60 

• 

o 

o 

.00 

.52 

.00 

.00 

.37 

1.00 

1.00 

21 

Moderate  Error 

In  b 

.00 

o 

« 

.00 

o 

o 

• 

.09 

.00 

1.00 

.99 

1.00 

22 

Large  Error 

In  b 

.00 

.30 

.00 

.00 

.30 

.00 

1.00 

.98 

1.00 

23 

Extreme  Error 

In  b 

.00 

1.00 

.00 

.00 

.89 

o 

o 

a 

1.00 

00 

00 

a 

1.00 

24 

Very  Large 

Error  In  b 

.00 

.50 

.00 

.00 

.48 

.00 

1.00 

.96 

1.00 

31 

Moderate  Error 

In  c 

.00 

.00 

.04 

.00 

.00 

.04 

1.00 

1.00 

.70 

32 

Large  Error 
in  c 

.00 

.00 

.08 

.00 

.00 

.08 

1.00 

1.00 

.46 

41 

Worst  Probable 
Combined  Error 

.60 

.30 

.08 

.51 

.32 

.08 

.47 

.98 

.46 

42 

Extreme 

Combined  Error 

.60 

.00 

.08 

.58 

.97 

00 

o 

• 

.44 

.88 

.43 

which  represented  extreme  (and  probably  unrealistic)  levels  of  error  In  all 
three  Item  parameters,  the  adaptive  teats  trith  error-laden  Item  parameters 
achlevbd  higher  fidelities  at  all  test  lengths  than  did  the  peaked  (F)  and  rect¬ 
angular  (R)  conventional  tests,  with  larger  differences  occurring  for  shorter 
test  lengths.  There  were  virtually  no  differences  In  fidelities  for  the  adap¬ 
tive  strategies  at  20-  or  30-ltem  test  lengths,  with  a  tendency  for  the  maximum 
Information  (MI)  adaptive  test  to  perform  somewhat  more  poorly  than  the  strati¬ 
fied  adaptive  (SA)  or  Bayesian  (B)  tests  at  lO-ltem  test  lengths.  Results  for 
the  other  dependent  measures  tended  to  support  the  fidelity  analysis;  that  is, 
with  the  exception  of  Item  Set  42,  adaptive  tests  using  error-laden  Item  parame¬ 
ter  estimates  generally  achieved  scores  with  lower  levels  of  Inaccuracy,  bias, 
and  RMSE  than  did  conventional  tests  of  the  same  lengths  using  error-free  item 
estimates. 

Analyses  of  the  data  In  terms  of  dependent  measures  conditioned  on  values 
of  6 — Inaccuracy,  bias,  RMSE,  the  two  information  measures,  and  efficiency — sup¬ 
ported  the  findings  from  the  overall  analysis.  When  errors  occurred  in  the  £, 
b,  and  £  parameters  separately,  there  was  very  little  effect  on  these  Indices 
and  the  adaptive  tests  measured  better  than  the  conventional  tests  at  virtually 
all  levels  of  6.  There  was  essentially  no  measurement  degradation  as  the  result 
of  errors  In  £  and  £,  with  a  slightly  greater  effect  for  b.  For  realistic  val- 


< 


i 


OB 

g  M  M 

P 

^  P< 

V  C/3 

G 

0  W  4) 

»  > 

2 

O  ■H 

i  «  u 

<  iH  a 
71  « 

eo  ao<a  i 
c  e  <d 

•H  (B  kl 
>»  4J  « 
POM 

4  O 

>  PB  P 


#  ' 

.-<  ON  e*^ 
00  00  ON 

•  •  • 

nO  nO  CO 

m  ^  -o 

•  •  • 

NO  VO  O 
^  ^ 

•  •  • 

ON  co 
m  lo 

•  *  • 

00  9^ 
fN»  00  ON 

•  •  • 

00  f-l  •B' 
r».  M  M 

•  •  • 

m  rM  t-i 
CM  .--1  O 

•  •  • 

00  *4’ 

ON  no 

•  •  • 

Of  9^  CO 

00  00  9^ 

•  •  • 

O  CNi  CO 
NO  m 

•  •  • 

m  NO  CM 

ipM 

•  •  • 

1  1  1 

m  no 

00  rs.  no 

•  •  • 

ON  lA  nO 
00  9n  ON 

•  •  • 

NO  04  CO 
•«0  CO  CM 

•  •  • 

00  CO 

•-I  o  o 

•  •  • 

NO  *9’ 

NO  CO 

•  •  • 

«A  fcA 

00  ON  ON 

•  •  • 

o  lO 
•*4*  CO  CM 

•  •  • 

m  CO  CM 

«-<  o  o 

•  •  • 

r^.  o  CM 

NO  ••O'  CO 

•  •  • 

\0  «9  nD 
00  On  ON 

•  •  • 

nO  rH  nO 
*0  CO  CM 

•  •  • 

>»  CM  O 

o  o  o 

•  •  • 

1  1 

•O'  1-4  •0' 

NO  -.O'  CO 

•  •  • 

CO  %o 

ON  Os  ON 

•  •  « 

00  oi 
CO  CM  CM 

•  •  • 

CO  nO  CO 

t-i  o  o 

•  •  • 

CO  1/^  ON 

m  CO  0^ 

•  •  • 

O  NO 

On  On 

•  •  • 

00  ^  o 

CO  CM  M 

•  •  * 

CM  CM 

f-i  O  O 

•  •  • 

1  1  1 

vO  CM  •“! 
O  O  O 
•  •  • 

1  1  1 

CM 

m  CO  CM 

•  •  • 

CO  o  rv 
On  On  ^ 

•  •  • 

lO  nO  ^ 
CO  CM  CM 

•  •  • 

00  cn  00 

cn  CM 

... 

CO  *n 
00  ON  ON 

•  •  • 

00  ON  00 
•O  CO  CO 

•  •  • 

ON  *-» 

o  o  »-* 

•  •  • 

CM  ON  00 
vO  -O'  *4* 

•  •  • 

M  fO  M 
«  ON  0\ 

•  »  • 

in  NO  CM 
vO  -O  -O' 

•  •  • 

CO  o  CO 
^  ^ 

•  »  * 

CM  CN..  CM 

00  in  lO 

•  •  • 

NO  CS  *A 
00  ON  ^ 

•  •  • 

>-•  rH  00 

m  ^  CO 

•  •  • 

CO  CM  CM 
O  f-4  *-• 

•  •  • 

CO 

NO  iTN  -a- 

... 

CO  NO 

On  ^  ^ 

•  •  • 

nO  nO  CO 
CO  CM  CM 

CO  CO  O 

o  o  o 

•  *  • 

in  f* 

CO  CO 

•  •  • 

O  NO 

On  On  On 

•  •  • 

tM  r».  M 
«9-  «M 

•  •  • 

O  f-4  CM 

o  o  o 

•  •  • 

00  \0  o 
m  cn  c'N 

•  •  • 

^  NO 

ON  ON 

«  •  • 

o  ON 

•O  CM  CM 

•  •  • 

m  CM  O 

o  o  o 

•  »  • 

1  1 

IT)  r>»  cN^ 
in  cn  cn 

... 

o  in 

9^  ^  ^ 

•  •  • 

O 

CM  CM 

•  •  • 

h-  I-M  CM 
O  O  O 

•  •  • 

NO  m  o 

m  cn  cn 

•  •  • 

^  m 

00  ^  On 

«  •  • 

•o  «  •«• 

■O  CM  CM 

•  •  • 

-.03  - 
.OA  - 
.06 

^  fv  f-C 
NO  CO  CO 

•  •  • 

00  -O  NO 
00  ON  ON 

«  •  • 

O  ON  >o 
•O  CM  CM 

•  •  * 

00  O  >0 

o  o  o 

... 

1 

m  00  CM 
in  cn  cn 

... 

CO  ^  CO 
rv  00  ON 

•  •  • 

o  »  <-l 
r-  ^  "O 

•  •  • 

00  m  o 
^  ^  ^ 

•  •  • 

00  ^  On 
ON  NO  1^ 

•  •  • 

CO  CN| 

00  ON  ON 

CO  00  CO 
m  CO  CO 

.08  - 
.04  - 
.02 

00 
nO  St 

>N 

•H 

& 

« 

1 

1  1 

«  o  o  o 

•o  f-i  N  m 
•H 

Ph 

u  o  o  o 

«  ^  CM  cn 

5 

a  o  o  o 

^  ^  CM  «*> 

« 

M  O  O  o 

^  .-1  CM  cn 

-  6  - 


ues  of  combined  error  In  the  three  Item  parameters,  some  measurement  degradation 
occurred  for  the  adaptive  tests,  but  they  still  measured  better  than  the  rectan¬ 
gular  conventional  test  at  all  6  levels,  and  better  than  the  peaked  conventional 
test  for  about  three-fourths  of  the  6  scale.  There  were  few  consistent  differ¬ 
ences  In  the  performance  of  the  different  adaptive  testing  strategies. 

Additional  research  In  progress.  Results  of  this  study  Indicated  very  lit¬ 
tle  effect  of  errors  In  Item  parameter  estimates  on  the  measurement  performance 
of  adaptive  testing  strategies.  Since  this  study  was  the  first  to  Investigate 
this  question.  It  was  necessarily  limited  In  a  number  of  ways.  Consequently, 
further  simulations  are  planned  that  (1)  vary  the  characteristics  of  the  Item 
pool  used  In  order  to  determine  the  generality  of  the  findings  across  Item  pools 
with  different  characteristics.  In  terms  of  levels  of  the  three  Item  parameters; 
that  (2)  allow  correlated  errors  to  occur  In  the  Item  parameter  estimates,  since 
only  uncorrelated  errors  were  used  In  this  study;  and  that  (3)  examine  the  ef¬ 
fects  of  error  In  Item  parameter  estimates  separately  for  one-,  two-,  and  three- 
parameter  IRT  models. 

Live-Testing  Comparison  of  Adaptive  Testing  Strategies 

Approach.  Three  testing  strategies — peaked  conventional,  Bayesian  adap¬ 
tive,  and  maximum  Information  adaptive— were  compared  on  the  basis  of  alternate 
forma  reliability  and  observed  Information.  The  tests  were  composed  of  60  five- 
choice  vocabulaiy  Items  that  were  divided  into  two  30-1 tern  alternate  forms.  The 
conventional  test  was  peaked  In  Information  values  evaluated  at  6  >  0.0.  Items 
administered  In  the  maximum  Information  and  Bayesian  testing  strategies  were 
selected  according  to  their  adaptive  item  selection  routines.  There  were  373 
students  In  the  conventional  testing  condition,  390  in  the  Bayesian  testing  con¬ 
dition,  and  233  In  the  maximum  Information  testing  condition. 

Testing  strategy  was  the  major  Independent  variable  of  Interest.  Methods 
of  scoring  were  also  compared.  These  Included  logistic  maximum  likelihood  scor¬ 
ing,  Bayesian  scoring,  and  (for  the  conventional  test)  proportion-correct  scor¬ 
ing.  Test  length  was  a  third  Independent  variable  of  Interest.  Thirty  test 
lengths  were  obtained  by  scoring  each  30-ltem  test  at  each  test  length  from  1  to 
30  items.  Testing  strategies  were  compared  on  the  basis  of  alternate  forms  re¬ 
liability  by  correlating  corresponding  ability  estimates  obtained  from  Forms  A 
and  B  for  a  given  testing  strategy. 

Since  the  test  data  were  scored  In  at  least  two  ways  (Bayesian  and  maximum 
likelihood),  a  total  of  seven  combinations  of  testing  strategy  and  scoring  meth¬ 
od  were  compared  on  the  basis  of  alternate  forms  reliability.  Scoring  strategy 
was  compared  on  the  basis  of  alternate  forms  reliability  by  comparing  reliabili¬ 
ties  of  a  single  testing  strategy  scored  by  more  than  one  method.  Three  of  the 
alternate  forms  reliabilities  paired  the  appropriate  scoring  method  with  each  of 
the  three  testing  strategies.  These  were  proportion-correct  scoring  of  conven¬ 
tional  tests,  maximum  likelihood  scoring  of  maximum  Information  tests,  and 
Bayesian  scoring  of  Bayesian-admlnlstered  tests.  The  remaining  four  alternate 
forms  reliabilities  were  obtained  by  scoring  the  item  response  data  by  a  scoring 
routine  other  than  the  appropriate  one.  In  this  way,  reliabilities  were  ob¬ 
tained  for  the  Bayesian  scoring  of  the  maximum  information  test,  maximum  likeli¬ 
hood  scoring  of  the  Bayesla  test,  Bayesian  scoring  of  the  conventional  test, 
and  maximum  like’  hood  sr  ^ng  of  the  conventional  test.  Reliabilities  were 


calculated  as  a  function  of  test  length.  Scoring  method  correlations  were  ob¬ 
tained  by  correlating  estimates  obtained  from  different  scorings  of  the  same 
testing  strategy.  These  correlations  were  used  to  analyze  the  similarity  of 
ability  estimates  obtained  from  different  scoring  methods  applied  to  a  single 
set  of  data. 

The  three  testing  strategies  were  also  compared  on  the  basis  of  their  er¬ 
rors  of  measurement.  This  was  assessed  In  two  ways:  (1)  using  estimated  errors 
of  measurement  derived  from  maximum  likelihood  scoring  and  (2)  using  estimated 
errors  of  measurement  from  Bayesian  scoring.  In  the  first  method,  test  Item 
responses  were  scored  by  maximum  likelihood  methods,  and  the  standard  errors  of 
measurement  (SEM)  associated  with  each  ability  estimate  was  calculated.  These 
values  are  the  reciprocal  of  the  square  root  of  test  Information  at  a  given  6 
level  and  estimate  the  standard  deviation  of  the  estimated  6  values  around  the 
true  9  value;  the  larger  the  SEM,  the  more  likely  the  estimate  will  be  Inaccu¬ 
rate.  The  posterior  variance  of  the  Bayesian  ability  estimate  was  the  second 
Index  used  to  compare  the  testing  strategies  on  the  basis  of  measurement  accura¬ 
cy.  Both  the  SEM  and  the  posterior  variances  were  examined  as  a  function  of 
estimated  ability  level. 

Results.  A  preliminary  report  of  the  results  of  this  study  Is  In  Johnson 
and  Weiss  (1980).  Parallel  forms  reliabilities  of  the  three  testing  strategies 
showed  that  after  11  Items  the  peaked  conventional  test  yielded  higher  reliabil¬ 
ities  than  either  of  the  adaptive  tests.  The  greatest  difference  between  reli¬ 
abilities  was  £  «  .09  between  the  adaptive  and  conventional  tests  at  the  30-1 tern 
test  length;  the  reliabilities  of  the  adaptive  tests  were  £  >  .81,  compared  with 
the  final  reliability  of  £  ■  .90  for  the  conventional  test.  The  conventional 
test  reliability  was  nearly  Identical  to  that  of  the  Bayesian  test  up  to  the 
10-ltem  test  length,  but  after  that  point  the  conventional  test  reliability  In¬ 
creased  more  quickly  than  that  of  the  adaptive  tests.  Although  adaptive  test 
reliabilities  showed  signs  of  leveling  off  toward  the  end  of  the  test,  the  reli¬ 
ability  of  the  conventional  test  appeared  to  Increase  steadily. 

In  comparisons  of  testing  strategies  scored  by  other  than  optimal  scoring 
strategies,  the  Bayesian  scoring  of  the  conventional  and  maximum  Information 
testing  strategies  yielded  higher  reliabilities  than  the  maximum  likelihood 
scoring  of  the  conventional  and  Bayesian  testing  strategies.  These  data  Indi¬ 
cate  that  Bayesian  scoring  of  an  adaptive  test  may  yield  more  stable  estimates 
of  ability  than  maximum  likelihood  scoring.  The  data  also  Illustrate  the  Inap- 
proprlateness  of  scoring  conventional  tests  with  maximum  likelihood  scoring 
methods,  since  extremely  low  reliabilities  (maximum  of  £  ■  .75)  were  obtained  at 
all  test  lengths.  The  correlations  between  scores  on  the  same  testing  strategy 
scored  by  different  methods  showed  that  the  highest  correlations  %fere  obtained 
for  Bayesian  and  proportion-correct  scores  of  the  conventional  test,  with  most 
correlations  between  .97  and  .99.  The  second  highest  level  of  correlation  %»s 
between  the  Bayesian  and  maxlmum-llkelihood-scored  maximum  Information  test, 
with  most  correlations  between  .93  and  .95.  When  the, maximum  Information  adap¬ 
tive  test  was  scored  by  the  Bayesian  scoring  method,  reliabilities  of  short 
adaptive  tests  were  higher  than  those  of  the  conventional  test,  and  differences 
In  reliabilities  were  smaller  at  longer  test  lengths. 

On  the  basis  of  the  reliability  data,  few  conclusions  can  be  drawn  about 
the  relative  merits  of  the  three  testing  strategies.  Limitations  of  the  Item 


-  8  - 


pool  might  account  In  part  for  the  lowered  reliability  of  the  adaptive  tests  In 
comparison  to  the  conventional  test,  since  adaptive  tests  depend  heavily  on  the 
quality  of  the  Items  In  the  Item  pool.  The  Item  pool  used  for  the  two  adaptive 
tests  had  fewer  Items  at  the  extremes  of  the  ability  range,  and  these  Items  had 
relatively  lower  discrimination  parameters.  Especially  at  abilities  where  there 
were  fewer  Items,  It  Is  likely  that  the  correlations  between  ability  estimates 
would  be  attenuated  and  that  the  adaptive  process  would  be  at  a  disadvantage  as 
testing  progressed.  The  result  would  be  that  toward  the  end  of  testing  there 
would  be  fewer  and  fewer  Items  available  at  a  given  ability  level. 

Another  factor  that  limits  the  comparison  of  the  testing  strategies  In 
terms  of  alternate  forms  reliability  correlations  Is  the  distribution  of  ability 
In  the  population.  Since  values  of  the  Pearson  product-moment  correlations  de¬ 
pend  on  the  distributions  of  the  ability  estimates  Involved,  different  ability 
distributions  can  result  In  different  levels  of  correlation.  Thus,  the  reli¬ 
ability  correlations  confound  the  distribution  of  the  ability  estimates  with  the 
measurement  precision  of  the  testing  strategies. 

Errors  of  measurement  derived  from  test  Information  yield  comparisons  of 
testing  strategies  that  are  unconfounded  by  the  distribution  of  the  ability  es¬ 
timates.  Comparisons  of  the  testing  strategies  on  the  basis  of  SEMs  and  poste¬ 
rior  variances  showed  that  at  no  point  on  the  ability  continuum  were  the  errors 
of  measurement  smaller  In  the  conventional  test  than  In  the  adaptive  tests.  In 
both  error  of  measurement  comparisons  there  was  poorer  measurement  at  the  low 
end  of  the  ability  distribution,  although  the  extremes — both  positive  and  nega¬ 
tive — were  less  precisely  measured  than  the  center  of  the  ability  continuum. 

The  results  Indicate  that  the  adaptive  tests  yielded  about  the  same  level  of 
measurement  precision  and  that  these  levels  were  greater  than  those  obtained 
from  the  conventional  test  at  all  levels  of  ability.  Thus,  adaptive  testing 
strategies  yielded  scores  with  greater  precision/information  (lower  errors  of 
measurement)  than  did  the  conventional  testing  strategy. 

Additional  research  In  progress.  Since  the  reliability  results  of  this 
study  were  contrary  to  expectations  and  conflicted  with  other  research  using  a 
similar  design  but  different  tests  (Kingsbury  &  Weiss,  1980;  McBride,  1980),  a 
fourth  test  was  added  to  the  study.  To  examine  the  effects  of  test  difficulty 
on  the  results,  this  test  was  a  second  conventional  test  In  which  average  Item 
difficulty  was  higher  than  that  of  the  first  conventional  test.  Data  were  col¬ 
lected  using  this  test  from  530  students  on  a  60-ltem  conventional  test  consist¬ 
ing  of  two  embedded  30-ltem  alternate  forms.  The  alternate  forms  reliabilities 
of  these  tests  will  be  computed  at  test  lengths  from  1  to  30  Items,  and  the  data 
will  be  further  analyzed  to  permit  direct  comparisons  with  the  three  other  test¬ 
ing  strategies. 

Future  Research  Plans 

In  addition  to  using  Item  parameters  that  contain  varying  degrees  of  error, 
real  adaptive  testing  Item  pools  may  deviate  from  the  unldlmenslonal  IRT  model 
that  has  been  applied  In  all  adaptive  testing  simulations.  Since  deviations 
from  unldlmenslonality  (e.g.,  Bejar,  1979,  1980;  Bejar,  Weiss,  &  Kingsbury, 

1977;  Reckase,  1978)  can  potentially  affect  the  performance  of  adaptive  testing 
strategies,  a  series  of  monte  carlo  simulation  studies  will  be  constructed 
around  the  degrees  and  types  of  dimensionality  observed  In  ability  test  data. 


These  studies  will  consist  of  the  generation  of  testee  responses  using  the  un¬ 
derlying  multidimensional  structures  observed  In  ability  test  Items,  but  adap¬ 
tive  branching  will  occur  by  means  of  several  adaptive  testing  strategies  de¬ 
signed  for  unldlmenslonal  adaptive  testing.  Thus,  the  research  question  will  be 
the  effects  of  violation  of  the  unldlmenslonallty  assumption  on  the  performance 
of  adaptive  testing  strategies.  Again,  the  evaluative  criteria  will  consist  of 
Information,  bias,  correlation  of  true  ability  and  ability  estimates,  and  other 
characteristics  of  the  ability  estimates  derived  from  the  unldlmenslonal  adap¬ 
tive  testing  strategies. 


Item  Formats  and  Response  Modes 

The  use  of  Interactive  computers  to  administer  ability  tests  allows  the 
design  and  u^e  of  test  Items  that  do  not  make  use  of  the  typical  multiple-choice 
Item  forma;  Research  on  alternatives  to  the  typical  multiple-choice  item 
(Bejar,  1975;  Vale,  1977)  suggests  that  there  Is  considerable  improvement  possi¬ 
ble  In  Information  utilization  from  adaptive  testing  by  use  of  response  modes 
other  than  multiple-choice  Items.  Consequently,  continued  research  in  this  area 
was  Indicated. 

Objective 

To  evaluate  the  utility  for  adaptive  testing  of  a  number  of  response  modes 
and  Item  formats  usable  In  adaptive  testing. 

Approach 

This  objective  was  pursued  using  the  six  Item  types  shown  In  Figure  2.  The 
studies  were  concerned  with  the  following  characteristics  of  these  Item  types: 

1 .  rhe  relationship  of  responding  In  the  various  formats  to  ability  lev- 
:!ls. 

2.  Tl'i:°  reliability  of  test  Item  responses  and  ability  estimates  obtained 
In  the  various  formats. 

3.  Information  characteristics  of  Items  and  tests  utilizing  the  various 
formats. 

4.  The  relationships  among  test  Item  responses  using  the  various  formats. 

5.  The  relative  validity  of  responses  obtained  from  the  various  formats. 

6.  The  generality  of  findings  obtained  from  the  different  response  formats 
to  different  populations  and  different  ability  dimensions. 

7.  The  comparative  factor  structure  of  tests  administered  In  the  various 
formats. 

The  research  was  designed  to  stxidy  the  characteristics  of  the  six  Item  for¬ 
mats  In  several  ability  areas.  It  Is  essentially  a  search  for  an  Item  format 
that  allows  testees  to  express  as  much  knowledge  as  they  have  available  about  a 
given  question  while  minimizing  the  effects  of  guessing.  The  results  obtained 
from  this  series  of  studies  will  be  used  to  select  several  Item  formats  to  be 
used  In  computerized  adaptive  testing. 


Two  sets  of  30  multiple-choice  items  were  chosen  from  available  item  pools. 
One  set  of  Items  was  chosen  from  a  pool  of  analogy  Items,  and  the  second  set  was 
chosen  from  a  pool  of  arithmetic  reasoning  items.  Both  Item  pools  Included  Item 


Figure  2 

Description  of  Response  Formats 


1.  Multiple-choice  Items  with  conventional  response  format.  These  items  were 
conventional  multiple-choice  Items  with  four  alternatives.  The  examinee  was 
asked  to  choose  the  correct  answer. 

Example .  Procedure  :  Activity 
1.  Diplomacy  :  Tact 
*2.  Itinerary  :  Journey 

3.  Minutes  :  Committee 

4.  Index  :  Book 


2.  Multiple-choice  Items  with  probabilistic  response  format.  These  items  were 
exactly  the  same  as  the  conventional  multiple-choice  Items,  but  the 
examinees  were  asked  to  assign  100  points  among  the  four  alternatives  to 
Indicate  their  confidence  In  the  '‘correctness”  of  each  alternative. 


Example .  Procedure  :  Activity  (Possible  Answer) 


1. 

Diplomacy  :  Tact 

0 

2. 

Itinerary  :  Journey 

75 

3. 

Minutes  :  Committee 

25 

4. 

Index  :  Book 

0 

3.  Dichotomous  Items  with  a  yes-no  (dichotomous)  response  format.  For  these 
items  only  the  Item  stem  and  one  alternative  vare  presented.  The  examinees 
were  asked  to  respond  with  a  "yes”  if  they  thought  the  alternative  provided 
was  a  correct  answer  to  the  question,  and  "no”  If  they  thought  It  was  not. 

Example .  Q.  Procedure  :  Activity  (Possible  Answer) 

A.  Index  :  Book  Yes 

4.  Dichotomous  Items  with  a  probabilistic  response  format.  These  Items  were 
identical  to  the  dichotomous  Items  with  a  yes-no  response  format,  but,  the 
examinees  were  asked  to  respond  with  a  probability  (a  number  from  0  to  100) 
which  reflected  their  confidence  that  the  alternative  provided  was  a  correct 
answer  to  the  question. 

Example .  Q.  Procedure  :  Activity  (Possible  answer) 

A.  Index  :  Book  10 


5.  Free-response  Items  with  a  conventional  response  format.  For  these  Items 
only  the  Item  stem  was  presented.  The  examinees  were  asked  to  provide  their 
own  answers. 

Example .  Q.  Procedure  :  Activity 
Itinerary  ;  _ 


6.  Free-response  Items  with  a  probabilistic  response  format.  Once  again  only 
the  Item  stem  was  presented,  but  this  time  the  examinees  were  asked  to 
provide  an  answer  to  the  question  and  to  assign  a  probability  (a  number  from 
0  to  100)  to  the  answer  they  gave  to  indicate  their  confidence  In  the 
"correctness”  of  their  response. 

Example .  Q.  Procedure  :  Activity  (Possible  answer) 

Itinerary  ;  _  Trip  90 


11  - 


parameters  calculated  on  large  numbers  of  Individuals  using  the  three-parameter 
logistic  model  of  LOGIST  (Wood  &  Lord,  1976;  Wood,  Wlngersky,  &  Lord,  1976). 

The  items  were  chosen  to  represent  a  uniform  range  of  difficulty  and  discrimina¬ 
tion  parameters.  Each  set  of  30  items  was  then  modified  to  conform  to  the  Item 
formats  shown  in  Figure  2.  For  each  Item,  the  Item  stem  remained  the  same, 
while  the  response  formats  were  changed,  resulting  In  the  6  sets  of  30  Items, 
each  with  a  different  response  format.  Tests  were  then  constructed  utilizing 
these  Items  and  the  tests  were  administered  In  various  combinations  to  a  number 
of  groups  of  several  hundred  college  students. 

The  data  collected  will  be  used  to  determine  the  factor  structure  and  the 
convergent  and  discriminant  validity  of  the  tests  using  the  different  response 
formats  In  the  two  ability  domains  and  will  be  compared  with  similar  data  al¬ 
ready  available  on  vocabulary  ability.  The  data  will  also  be  used  to  compare 
the  Item  parameters.  Item  and  test  Information  functions.  Internal  consistency, 
and  Interrelationships  among  scores  derived  from  the  various  response  formats. 

Results 


Preliminary  data  analyses  were  completed  on  data  collected  on  three  of  the 
six  Item  formats  using  the  analogies  Items.  The  formats  analyzed  were  multiple- 
choice  Items  with  conventional  response  format  (MCC),  multiple-choice  Items  with 
probabilistic  response  format  (MCP),  and  dichotomous  Items  with  a  dichotomous 
(MCD)  response  format  (Types  1  through  3  In  Figure  2). 

Examination  of  Table  3  shows  the  average  Item  scores  for  the  30  Items  In 
the  three  formats.  The  MCC  and  MCP  Items  were  rank  ordered  very  similarly,  as 
evidenced  by  the  correlation  of  .91  between  the  average  Item  scores  for  these 
two  formats.  The  DD  Items  ordered  themselves  somewhat  differently,  as  Indicated 
by  the  correlations  between  the  DD  average  Item  scores  and  the  MCC  and  MCP  aver¬ 
age  Item  scores,  which  were  .43  and  .52,  respectively.  Although  the  present 
results  reflect  only  one  scoring  system  for  the  MCP  Items,  several  other  scoring 
methods  will  also  be  Investigated  for  this  response  format. 

Table  4  shows  validity  and  internal  consistency  reliability  coefficients 
obtained  for  the  three  item  formats.  The  validity  coefficient  reported  Is  the 
correlation  of  total  score  with  the  reported  grade-point  average  of  the  students 
(later  analyses  are  planned  using  actual  GPA  rather  than  reported  GPA) .  The 
validity  coefficients  for  all  three  formats  were  not  significantly  different 
from  each  other,  and  were  significantly  different  from  zero.  The  MCP  Items  had 
the  highest  Internal  consistency  reliability,  but  the  lowest  validity.  The  MCC 
Items  had  the  second  highest  reliability,  and  highest  validity;  and  the  DD  Items 
had  lowest  reliability  with  moderate  validity. 

A  number  of  factor  analyses  were  performed  on  the  data  from  the  three  Item 
formats.  Both  principal  axes  and  confirmatory  analyses  were  used.  The  princi¬ 
pal  axes  analyses  showed  that  two  orthogonal  factors  were  extracted  for  each 
response  format,  and  the  pattern  of  positive  and  negative  loadings  was  extremely 
similar  for  the  MCC  and  MCP  Items  but  was  very  different  for  the  DD  Items.  In 
addition,  the  percent  of  total  score  variation  accounted  for  by  the  two-factor 
solution  varied  with  Item  format. 


Confirmatory  factor  analysis  was  performed  using  the  principal  axes  factor 


-  12  - 


Table  3 

Average  Item  Scores  for  the  Same  30  Analogies  Items  in 
Multiple-Choice  Conventional  Format  (MCC) , 
Multiple-Choice  Probabilistic  Format  (MCP), 
and  Dichotomous-Dichotomous  (DD)  Format 


Item  _ 

Response  Format 

Number 

MCC* 

MCP** 

DD* 

1 

.86 

1.18 

.87 

2 

.64 

.66 

.42 

3 

.78 

.99 

.65 

4 

.52 

.58 

.81 

5 

.73 

1.02 

.80 

6 

.79 

1.08 

.52 

7 

.28 

.24 

.64 

8 

.48 

.64 

.61 

9 

.61 

.92 

.51 

10 

.58 

.80 

.73 

11 

.60 

.73 

.44 

12 

.89 

1.19 

.71 

13 

.77 

1.08 

.79 

14 

.62 

.77 

.32 

15 

.75 

1.29 

.82 

16 

.78 

1.07 

.92 

17 

.64 

.94 

.70 

18 

.84 

1.48 

.87 

19 

.41 

.63 

.80 

20 

.68 

1.05 

.84 

21 

.87 

1.14 

.98 

22 

.84 

1.06 

.65 

23 

.87 

1.32 

.78 

24 

.91 

1.41 

.98 

25 

.68 

1.08 

.56 

26 

.77 

.88 

.80 

27 

.75 

1.25 

.83 

28 

.86 

1.29 

.94 

29 

.54 

.60 

.58 

30 

.35 

.42 

.66 

*Proportion  correct. 

**Average  score  with  range  from  2.00  to  -1.00. 


solution  for  the  MCC  items  as  the  basis  for  the  model.  The  data  showed  better 
fit  to  the  model  for  the  MCP  items  than  for  the  DD  items,  with  the  second  factor 
a  rather  inconsequential  factor  for  all  three  response  formats. 

Thus,  the  results  obtained  thus  far  advise  against  the  use  of  the  DD  item 
format.  This  item  format  is  less  reliable  than  the  other  response  formats,  has 
only  moderate  validity,  shows  high  levels  of  guessing,  and  does  not  appear  to  be 
consistently  measuring  analogies  ability.  The  DD  response  format  does  not  ap¬ 
pear  to  be  a  viable  alternative  to  the  multiple-choice  item.  On  the  other  hand, 
the  MCP  item  format  does  appear  to  be  a  promising  alternative  to  the  traditional 


A  _l. 


-  13  - 


Table  4 

Alpha  Internal  Consistency  Reliability  Coefficients 
and  Validity  Correlations  with  Reported  GPA 
for  Three  Response  Formats 


Response  Format 

N 

Coefficient 

Alpha 

Validity 

Coefficient 

Multiple-Choice 

Conventional 

486 

.85 

.23 

Multiple-Choice 

Probabilistic 

299 

.91 

.17 

Dichotomous- 

Dichotomous 

303 

.59 

.20 

multiple-choice  Item.  It  Is  more  reliable,  nearly  equally  as  valid,  and  appears 
to  be  measuring  analogies  ability  to  a  greater  degree  than  the  conventional  mul¬ 
tiple  choice-items. 

Additional  Research  In  Progress 


Further  research  and  analysis,  of  course,  are  needed  to  determine  whether 
or  not  these  results  are  generallzable  across  ability  domains,  and  whether  any 
of  the  Item  formats  not  yet  analyzed  will  also  show  promise  as  alternatives  to 
the  standard  multiple-choice  Item.  Considerable  amounts  of  additional  data  will 
be  obtained  on  these  Item  formats  using  both  the  analogies  and  numerical  reason¬ 
ing  Items,  and  the  resulting  data  will  be  analyzed  to  Investigate  the  questions 
raised  earlier.  In  addition,  further  evidence  of  the  generality  of  the  findings 
will  be  sought  using  vocabulary  ability  Items  that  are  being  analyzed  by  similar 
methodologies.  Once  remaining  Item  types  are  Identified  as  replacements  for  the 
multiple-choice  Item,  adaptive  tests  using  these  Item  types  will  be  designed  and 
their  characteristics  Investigated. 

Fit  of  Individuals  to  Item  Response  Theory  Models 

Previous  research  on  the  person  response  curve  (Trabln  &  Weiss,  1979)  and 
related  research  on  person  fit  (Levine  &  Drasgow,  1980;  Levine  &  Rubin,  1979) 
promises  the  capability  of  Identifying  Individuals  who,  on  a  given  test,  are  not 
responding  In  accordance  with  a  given  IRT  model.  This  lack  of  fit  to  the  model 
can  derive  from  a  number  of  possible  causes,  including  the  following: 

1.  Lack  of  motivation  to  respond  appropriately  to  the  test; 

2.  Inappropriate  or  nonrandom  guessing,  or  lack  of  guessing; 

3.  Responses  that  are  not  In  accord  with  the  unldlmenslonallty  assumption 
of  ICC  theory. 

Knowledge  that  an  Individual  Is  not  responding  according  to  the  model  for  any  of 
these  reasons  would  be  appropriate  information  to  be  used  In  applied  situations 
suggesting  that  the  scores  of  that  person  be  carefully  considered  before  deci¬ 
sions  are  made  on  the  basis  of  those  scores.  It  may  also  be  an  Important  moder¬ 
ator  variable  for  use  In  prediction  studies. 


V 


-  14  - 


Objective 


To  further  Investigate  the  utility  of  the  person  response  curve  (PRC)  con¬ 
cept  and  to  Identify  the  correlates  of  deviations  of  an  Individual  from  the  IRT 
model. 

Approach 

To  properly  Investigate  the  usefulness  of  the  PRC  approach  to  the  problem 
of  person  fit.  Its  characteristics  were  Investigated  within  the  context  of  other 
Indices  designed  for  similar  purposes.  Based  on  a  thorough  review  of  the  rela¬ 
tively  small  literature  on  person  fit  (also  known  as  "appropriateness”  measure¬ 
ment),  the  following  Indices  were  Identified: 

1.  Trabln  and  Weiss's  (1979)  chi-square  Index  of  the  fit  of  observed  and 
expected  PRCs . 

2.  Reckase's  (1977)  mean  square  deviation  (MSD)  Index  averaged  over  Items, 


2  / 

MSDj  =  Z  (Uij  -  Pij)  /  N 


where 

MSDj  ■  the  mean  squared  deviation  for  person 
u^j^j  »  the  actual  response  to  Item  ^  by  person 

probability  of  a  correct  response  as  predicted  by  the 
three-parameter  IRT  model,  and 
N  "  the  number  of  items  In  the  test. 

This  Index  Is  a  special  case  of  the  fit  of  the  observed  PRC 
to  the  expected  PRC. 

3.  A  variation  of  Reckase's  Index  in  which  only  Improbable  responses  are 
scored.  It  Is  difficult  to  argue  that  those  responses  that  are  In  the 
predicted  direction  should  be  Included  In  a  person-fit  statistic. 

Hence,  only  where  (^j  -  £ij)^  greater  than  .25  is  It  Included  In 

the  statistic.  The  divisor  of  the  statistic  Is  still  the  number  of 
items  In  the  test. 

4.  The  likelihood  ratio  Index  of  person  fit.  For  a  given  6  value,  a  prob¬ 
ability  distribution  for  the  possible  response  vectors  can  be  generat¬ 
ed.  The  probability  of  an  Individual's  actual  response  vector  Is  di¬ 
vided  by  the  probability  of  the  most  likely  response  vector  to  produce 
the  likelihood  ratio. 

5.  Wright  (1968)  proposed  an  Index  of  person  or  Item  fit,  which  Is  the  sum 
of  the  standardized  squares  of  the  residual  after  fitting  the  model. 

For  the  Rasch  model  it  is  for  an ^Incorrect  answer  and  for 

a  correct  answer,  where  e  *  2.71,  £  Is  6  for  an  Individual,  and  ^  Is 
the  difficulty  of  the  Item. 


6.  Response  pattern  Information,  which  reflects  the  flatness  of  the  like¬ 
lihood  function. 

7.  The  three  appropriateness  indices,  and  3  described  by  Levine 

and  Drasgow  (1980)  and  Levine  and  Rubin  (1979). 

8.  The  difference  between  the  difficulties  of  the  easiest  item  answered 
Incorrectly  and  the  most  difficult  item  answered  correctly. 

9.  Oonlon  and  Fisher's  (1968)  "personal  blserlal  correlation"  and  Jacobs' 
(1963)  variation  of  it. 

10.  The  posterior  variance  of  IRT-based  Bayesian  ability  estimates. 

The  utility  of  these  indices  for  identifying  individual  nonfit  to  IRT  mod¬ 
els  is  first  being  studied  in  simulation.  As  a  first  step,  the  null  distribu¬ 
tions  of  these  person-fit  statistics  are  being  examined,  to  serve  as  reference 
points  for  the  later  studies  to  determine  how  well  each  statistic  identifies 
person  nonfit.  Using  a  set  of  2,500  simulees  rectangularly  distributed  at  25 
equally  spaced  levels  of  6,  625  itoDS  were  administered  to  each  simulee.  Items 
were  rectangularly  distributed  in  ^  with  25  items  at  each  of  25  levels  of  ^  cor¬ 
responding  to  the  25  levels  of  6.  Within  each  level  of  Jb,  items  varied  in  dis¬ 
crimination  at  .07  intervals.  Simulation  data  were  generated  separately  for  the 
2,500  simulees  for  £  •  0.0,  .20,  and  .25  in  order  to  examine  the  effects  of 
guessing  on  the  null  distributions.  To  examine  the  effects  of  test  length  and 
discrimination,  shorter  tests  were  selected  from  the  item  pool  at  differing  lev¬ 
els  of  £.  For  each  of  these  test  configurations,  null  distributions  were  com¬ 
puted  for  each  of  the  person-fit  indices. 

After  distributions  of  these  indices  are  known  for  model-conforming  data, 
increasing  amounts  of  random  responses  will  be  added  to  the  response  vectors  to 
simulate  random  responding,  inattention  or  low  motivation,  and  guessing.  Chang¬ 
es  in  the  distributions  that  result  from  Increasingly  random  responding  will  be 
noted.  In  this  phase,  percent  of  random  response  is  an  additional  independent 
variable.  Those  person-fit  statistics  that  are  affected  most  strongly  by  random 
responding  will  be  retained  as  good  candidates  for  further  live-data  research. 
Results  of  the  data  analysis  are  not  yet  available  from  these  studies. 

Additional  Research  Plans 

The  effect  of  multidimensionality  on  the  person-fit  indices  will  be  exam¬ 
ined.  This  will  be  studied  by  generating  response  data  from  a  6  known  to  corre¬ 
late  to  varying  degrees  with  the  original  6  and  by  inserting  these  responses 
into  the  response  vectors.  Degree  of  correlation  as  well  as  number  of  dimen¬ 
sions  will  be  manipulated.  Dependent  variables  studied  will  be  the  ability  of 
the  person-fit  Indices  to  identify  the  existence  of  the  multidimensional  re¬ 
sponse  patterns. 

Once  promising  person-fit  indices  are  identified  in  the  simulation  studies, 
live-testing  studies  will  be  designed  in  which  the  fit  of  persons  to  the  IRT 
model  is  empirically  tested  on  a  given  pool  of  items.  Experimental  studies  will 
be  designed  to  attempt  to  induce  deviations  from  the  model  in  groups  of  individ¬ 
uals  and  to  observe  whether  the  person-fit  indices  are  sufficiently  sensitive  to 
identify  those  deviations  %dien  they  exist.  In  addition,  the  existence  of  natu- 


A— . 


-  16  - 


rally  occurring  groups  in  which  deviation  from  the  IRT  model  might  occur  will  be 
studied.  For  example.  It  can  be  hypothesized  that  on  certain  kinds  of  subtests 
(e.g.,  verbal  ability  tests)  students  from  a  non-English  speaking  culture  would 
likely  show  significant  deviations  from  unldimenslonallty .  It  could  also  be 
hypothesized  that  students  who  are  not  ’’test— wise”  or  who  have  a  lack  of  famil¬ 
iarity  with  multiple-choice  tests  would  show  specific  deviations  from  IRT  models 
in  their  test  performance.  In  addition,  the  generality  of  person-fit  variables 
across  ability  dimensions  will  be  studied  In  order  to  determine  whether  devia¬ 
tions  of  fit  to  the  model  for  an  Individual  are  specific  to  an  item  pool  or  oc¬ 
cur  across  different  kinds  of  Item  domains. 

New  Types  of  Ability  Tests 


Psychometric  attempts  to  measure  Individual  differences  in  cognitive  abili¬ 
ties  during  the  last  60  years  have  produced  tests  of  global  abilities  such  as 
general  reasoning,  verbal  and  quantitative  abilities,  as  well  as  more  specific 
ability  ’’factors"  such  as  speed  and  flexibility  of  closure,  spatial  orientation, 
and  word  fluency  (French,  Ekstrom,  &  Price,  1963).  Research  in  adaptive  testing 
has  shown  that  the  precision  and  validity  of  ability  tests  can  be  Improved  by 
adaptive  testing  procedures.  At  the  same  time,  cognitive  psychologists  have 
developed  several  standard  tasks  or  paradigms  that  have  been  used  to  study  the 
mechanisms  and  structure  of  aspects  of  memory,  attention,  and  cognition  (e.g., 
Sperling,  1960;  Sternberg,  1966).  Attempts  to  assess  quantitative  differences 
between  Individuals  In  such  Information-processing  abilities  and  to  relate  them 
to  more  traditional  psychometric  measures  Is  a  fairly  recent  phenomenon  (Chlang 
&  Atkinson,  1976;  Day,  1977;  Hunt,  Frost,  &  Lunneborg,  1973;  Hunt,  Lunneborg,  & 
Lewis,  1975;  Lunneborg,  1977;  Rose,  1974).  Carroll  (1976)  has  provided  an  addi¬ 
tional  framework  for  relating  traditional  psychometric  factors  to  their  cogni¬ 
tive  Information-processing  requirements.  Some  of  this  research  has  suggested 
that  Individual  differences  In  such  Information-processing  abilities  can  be  mea¬ 
sured  reliably  by  Interactive  computers  and  that  these  abilities  may  add  Incre¬ 
mental  validity  in  predicting  external  job  criteria  (Cory,  1977;  Cory,  Rlmland, 

&  Bryson,  1977). 

In  the  past,  tasks  of  the  type  that  have  been  used  by  cognitive  psycholo¬ 
gists  to  measure  Informatlon-processli^  abilities  and  by  psychometrlclans  to 
measure  perceptual  and  spatial  factors  have  been  administered  as  blocks  of  fixed 
numbers  of  trials  or  replications.  As  a  result,  little  Is  known  about  employing 
the  Important  parameters  of  these  tasks  for  adapting  the  difficulty  level  of 
replications  to  converge  upon  an  Individual's  ability  level.  A  major  emphasis 
of  the  research,  therefore.  Involved  (1)  studying  some  of  these  tasks  from  the 
point  of  view  of  how  computerized  adaptive  administration  could  be  meaningfully 
achieved  and  (2)  evaluating  the  measurement  benefits  of  adaptive  administration 
of  ability  tests  designed  to  utilize  the  unique  capabilities  of  computerized 
administration. 

Objective 

The  objectives  were  to  Investigate  the  application  of  adaptive  testing 
techniques  to  Improving  the  measurement  characteristics  of  several  cognitive 
information-processing  tasks  (e.g.,  short-term  memory  span,  capacity  of  visual 
sensory  memory).  Computerized  administration  of  these  ability  tests  will  be 
Investigated  as  a  means  of  modifying  task  presentations  over  time  In  a  way  that 
would  not  be  possible  In  paper-and-pencll  testing. 


-  17  - 


Approach 

Two  types  of  ability  test  Items  that  utilized  the  unique  capabilities  of 
computer  administration — Memory  for  Patterns  and  Digit  Span — were  studied  to 
Investigate  the  feasibility  of  adaptive  administration  of  information-processing 
tests.  These  tasks  were  studied  from  the  point  of  view  of  enabling  the  computer 
to  adapt  the  difficulty  of  the  tasks  presented  to  the  ability  level  of  the 
testee  during  the  process  of  testing. 

Memory  for  Patterns.  This  test  was  based  on  a  procedure  devised  by 
Sperling  (1960)  for  determining  the  capacity  of  visual  sensory  memory.  The  pro¬ 
cedure  is  based  on  presentation  of  arrays  of  letters  that  must  be  studied  by  the 
testee.  The  procedure  will  be  modified  to  adapt  to  a  testee's  recall  on  previ¬ 
ous  screen  presentations  in  r^rier  tc  obtain  precise  quantitative  estimates  of 
individual  differences  in  visa'll  sensory  memory  capacity  with  fewer  replica¬ 
tions.  For  example,  the  size  ;  c  the  array  (and  thus  the  memory  capacity  de¬ 
mands)  can  be  made  larger  nr  smailer  based  on  an  individual's  previous  perfor¬ 
mance;  and/or  the  duration  of  the  array  presentation  could  be  lengthened  or 
shortened  to  adapt  the  task  difficulty  to  the  testee's  ability  level  during  the 
course  of  testing.  Prl.tr  to  implementing  such  an  adaptive  approach,  however, 
psychometric  research  is  needed  to  determine  how  much  of  an  Increase  or  decrease 
in  stimulus  array  size  constitutes  a  meaningful  Increment  or  decrement  in  diffi¬ 
culty  and  frtiat  ranges  of  array  size  are  needed  to  adequately  span  differences 
existing  in  various  populations  of  interest. 

Data  collection  on  two  tests  of  short-term  spatial  and  perceptual  memory 
was  therefore  designed  to  allow  a  preliminary  evaluation  of  potential  adaptive 
testing  parameters.  The  experimental  Memory  for  Patterns  items  consisted  of 
bounded  two-dimensional  arrays  containing  3  to  10  letters.  Each  item  consisted 
of  a  pair  of  successive  screen  presentations.  The  stimulus  display  was  present¬ 
ed  for  a  brief  timed  period  and  then  was  erased  from  the  cathode  ray  terminal 
(CRT)  and  replaced  with  the  recall  display.  The  recall  display  contained  a 
bounded  letter  pattern  that  was  identical  to  the  first  pattern  (the  stimulus 
display)  except  that  one  or  two  letters  had  changed  position.  The  recall  dis¬ 
play  was  untlmed  and  accepted  the  student's  response,  indicating  which  letter(s) 
she/he  thought  had  moved.  Figure  3  shows  sample  Memory  for  Patterns  stimulus 
and  response  display,  constituting  one  test  item.  Data  were  also  collected  on  a 
related  set  of  items  designed  to  measure  Space  Memory,  which  were  similar  to  the 
Memory  for  Patterns  items  except  that  the  letter  patterns  presented  were  un¬ 
bounded  and  thus  spread  about  the  entire  CRT  screen. 

In  the  initial  data  collection  for  the  Memory  for  Patterns  and  Space  Memory 
items,  subtests  varied  in  the  niunber  of  letters  that  could  change  position  from 
the  first  pattern  to  the  second.  Three  10-item  subtests  were  administered  to 
each  student  in  a  number  of  different  experimental  groups.  The  first  subtest 
was  composed  of  patterns  in  which  one  letter  changed  position,  the  second  sub¬ 
test  was  composed  of  patterns  in  which  two  letters  changed  position,  and  the 
third  subtest  Included  patterns  in  which  either  one  or  two  letters  changed  posi¬ 
tion.  Students  were  assigned  to  one  of  four  conditions  that  varied  the  order  in 
which  the  items  were  presented  and  the  presentation  time,  in  seconds,  of  the 
first  pattern  of  each  item  pair.  In  the  Memory  for  Patterns  test  the  two  pre¬ 
sentation  times  were  5  and  10  seconds,  and  in  the  Space  Memory  test  they  were  7 
and  12  seconds.  The  two  orders  in  which  items  were  presented  were  (1)  from  low¬ 
est  to  highest  difficulty  and  (2)  from  moderate  to  highest  to  lowest  difficulty. 


Figure  3 

A  Sample  Memory  for  Patterns  Item,  with  Stimulus  Display 
on  the  Left  and  Recall  Display  on  the  Right 


Type  the  letter  which  has  changed 
position,  or  a  question  mark  (?). 
Then  press  "RETURN". 


1 

F  1 

1 

F  1 

1 

B 

1 

1  B 

1 

M 

1 

1  M 

V  1 

1 

D| 

1 

dI 

1 

Z 

1 

1  z 

|v 

K 

1 

1  K 

1 

0 

1 

1  0 

where  difficulty  was  Indexed  by  the  number  of  letters  In  the  pattern.  Reactions 
to  the  tests  were  also  obtained  from  each  student. 

The  analysis  of  the  data  was  directed  at  scaling  the  difficulty  of  Items 
and  at  studying  the  effects  of  pattern  density  and  presentation  times  on  Item 
difficulty  Indices.  A  comparison  of  the  Item  order  conditions  was  made  to  exam¬ 
ine  the  effects  of  practice  and  proactive  Inhibition  on  indices  of  Item  diffi¬ 
culty.  A  comparison  of  presentation  time  conditions  was  directed  at  determining 
reasonable  Item  exposure  times  and  at  Investigating  the  possibility  of  using 
presentation  time  along  with  pattern  density  and  number  of  pattern  changes  as 
adaptive  parameters  for  future  adaptive  testing. 

The  results  of  preliminary  analyses  of  the  Memory  for  Patterns  items  were 
used  to  design  new  Items  and  to  modify  the  experimental  conditions  under  which 
they  were  administered.  The  major  conclusions  from  the  preliminary  analysis  was 
that  Item  difficulty  was  more  a  function  of  type  of  pattern  configuration  than 
either  rate  or  order  condition.  For  this  reason,  types  of  Memory  for  Patterns 
Items  were  hypothesized  in  a  systematic  manner.  Eight  Memory  for  Patterns  Item 
types,  which  can  be  separated  Into  two  groups,  were  developed.  One  group  of 
Items  Is  composed  of  patterns  taking  a  geometric  form,  such  as  a  line,  triangle, 
square,  or  pentagon.  Four  item  types  of  this  nature  were  developed: 

1.  An  Item  that  had  a  geometric  form  was  changed  to  a  nongeometrlc  form 
through  a  small  move, 

2.  An  Item  that  had  a  geometric  form  was  changed  to  a  nongeometrlc  form 
through  a  large  move, 

3.  An  Item  that  had  a  nongeometrlc  form  was  changed  Into  a  geometric  form 
through  a  small  move,  and 

4.  An  Item  that  had  a  nongeometrlc  form  was  changed  Into  a  geometric  form 
through  a  large  move. 

Within  the  four  Item  types,  there  were  various  degrees  of  nongeometrlc  form  that 
the  patterns  could  have. 

A  second  group  of  Items  was  composed  of  nongeometrlc  forms,  but  the  pat¬ 
terns  varied  In  terms  of  pattern  configuration  definition.  For  example,  some 
patterns,  although  not  geometric  In  form,  are  better  defined  and  thus  easier  to 


-  19  - 


• ' 


T" 


remember.  The  four  item  types  were  as  follows: 

1.  Well-defined  pattern  configuration  with  a  small  change  In  letter  con¬ 
figuration, 

2.  Well-defined  pattern  configuration  with  a  large  change  In  letter  con¬ 
figuration, 

3.  Poorly  defined  pattern  configuration  tflth  a  small  change  In  letter  po¬ 
sition,  and 

4.  Poorly  defined  pattern  configuration  with  a  large  change  In  letter  po¬ 
sition. 

Hypotheses  were  made  with  regard  to  Item  type  and  resultant  Item  difficulty. 
Ninety  new  Items  were  written  based  on  these  eight  basic  Memory  for  Patterns 
Item  types. 

Of  the  90  Items,  42  were  administered  under  various  experimental  condi¬ 
tions.  Students  were  assigned  sequentially  to  one  of  15  testing  conditions. 

The  42  Memory  for  Patterns  Items  were  presented  under  three  order  and  five  rate 
conditions.  Values  of  pattern  densities,  or  the  number  of  letters  In  a  configu¬ 
ration  were  3,  4,  5,  6,  7,  8,  9,  and  10  letters.  The  42  Items  were  presented  In 
three  orders:  (1)  ordered  from  3  to  10  letters  In  a  pattern;  (2)  ordered  from  6 
to  10  letters,  then  3  to  5  letters  In  a  pattern;  and  (3)  ordered  from  8  to  10, 
then  3  to  7  letters  In  a  pattern.  The  five  rate  conditions  were  3,  5,  7,  10, 
and  13  seconds  In  duration.  Analyses  of  these  data  will  be  oriented  toward  the 
identification  of  adaptive  parameters  for  the  Items  and  the  effects  of  admlnls- 
tratlon  conditions  (e.g.,  sequence  effects)  on  the  adaptive  parameters. 

Digit  Span.  Contrasting  with  the  Memory  for  Patterns  tests,  which  tap  both 
spatial  abilities  and  short-term  memory  capacity,  the  Digit-Span  test  Is  primar¬ 
ily  a  test  of  short-term  memory.  In  this  test  a  series  of  numbers  Is  presented 
In  rapid  succession.  The  respondent  Is  asked  to  recall  the  numbers  In  the  order 
they  were  presented  by  typing  them  Into  the  computer  terminal  In  a  serial 
string.  Since  the  time  Interval  between  presenting  the  numbers,  clearing  the 
screen,  and  asking  for  a  response  was  very  short,  the  test  was  essentially  an 
Indicator  of  short-term  memory  capacity. 

To  Identify  possible  adaptive  parameters  for  this  type  of  test,  presenta¬ 
tion  rate  was  varied  experimentally  so  that  Items  were  presented  at  one  of  three 
rates — .2  seconds,  .3  seconds,  and  .5  seconds — corresponding  to  fast,  moderate, 
and  slow  rates.  Series  length  (the  number  of  stimulus  values  to  be  recalled),  a 
second  potential  adaptive  testing  parameter,  varied  from  4  to  10  stimuli.  Six 
Items  at  each  test  length  were  administered  to  yield  a  total  of  42  digit  series 
In  the  test.  Length  of  digit  series  and  of  presentation  rate  will  be  Investi¬ 
gated  as  possible  test  and  Item  parameters. 

The  Items  were  presented  In  three  different  orders:  (1)  from  easy  to  diffi¬ 
cult  (where  difficulty  was  defined  In  terms  of  numbers  of  stimulus  values  to 
recall),  (2)  from  moderately  difficult  to  difficult  to  easy,  and  (3)  from  diffi¬ 
cult  to  easy  to  moderately  difficult.  The  order  of  Item  presentation  will  be 
analyzed  to  determine  If  there  are  practice  or  prohibitive  effects  from  one  Item 
to  the  next,  since  such  effects  would  be  undesirable  In  adaptive  test  adminis¬ 
tration,  and  to  determine  the  effects  of  series  length  and  display  time  on  item 
difficulties. 


Analyses  and  Future  Plans 

Data  analyses  and  future  data  collection  »»111  be  oriented  toward  identify¬ 
ing  potential  adaptive  parameters  in  terms  of  factors  influencing  the  difficulty 
of  test  items*  In  addition,  the  influence  of  undesirable  factors — such  as  se¬ 
quence  effects,  inhibitive  effects,  and  other  factors  that  would  interfere  with 
adaptive  adininistration--will  be  investigated  in  order  to  permit  the  design  and 
evaluation  of  adaptive  tests  of  information-processing  abilities. 

Other  measures  of  more  traditional  psychometric  factors,  such  as  flexibili¬ 
ty  of  closure,  may  also  benefit  from  application  of  a  computerized  adaptive 
testing  framework.  A  common  measure  of  this  construct  has  been  variants  of  the 
Hidden  or  Embedded  Figures  test.  The  unique  capabilities  of  computerized  admin¬ 
istration  may  allow  Increased  validity  to  be  achieved  by  inducing  movement  into 
either  the  stem  figure  and/or  the  alternative  response  figures.  For  example, 
the  testee  may  be  required  to  selectively  attend  to  and  to  articulate  the  stem 
figure  in  a  more  complex  figure,  which  not  only  contains  distracting  lines  but 
also  grows,  shrinks,  translates,  and/or  rotates  over  time.  Adaptive  administra¬ 
tion  can  be  achieved  by  modifying  the  amount  of  "noise”  in  the  figure  and  by 
dynamically  adapting  the  amount  and  speed  of  movement  in  the  complex  figure. 
Several  psychometric  questions  can  be  studied.  For  example,  what  size  incre¬ 
ments  in  "noise"  and  movement  will  allow  the  computer  to  most  efficiently  con¬ 
verge  upon  the  testee 's  ability  level?  Are  amounts  of  "noise"  and  movement  in¬ 
dependent  dimensions  of  difficulty  to  be  manipulated?  How  is  performance  under 
varying  degrees  of  noise  and  movement  to  be  scored? 

The  design  and  implementation  of  adaptive  tests  of  these  information-pro¬ 
cessing  kinds  of  ability  tests  raises  a  host  of  new  questions  and  problems  to  be 
investigated.  Beyond  the  identification  of  adaptive  parameters  for  these  kinds 
of  items,  and  ruling  out  extraneous  factors  such  as  sequence  effects  (which  can 
reduce  the  effectiveness  of  the  adaptive  procedures),  new  questions  will  need  to 
be  addressed  with  regard  to  the  design  and  scoring  of  the  adaptive  tests.  De¬ 
sign  questions  will  Include  the  identification  of  the  functions  relating  display 
time  to  item  difficulty  and  Identification  of  the  procedures  for  using  this  con¬ 
tinuous  function  in  adapting  display  time  on  each  item  to  each  individual's  test 
performance  on  an  item-by-ltem  basis.  Since  it  may  be  observed  that  this  func¬ 
tion  is  different  for  items  of  different  difficulties  based  on  stimulus  charac¬ 
teristics  (e.g.,  pattern  density,  length  of  span  string),  adaptive  testing  pro¬ 
cedures  will  need  to  be  developed  that  will  jointly  take  into  account  the  combi¬ 
nation  of  discrete  and  continuous  difficulty  factors.  New  scoring  procedures 
may  also  need  to  be  designed  for  these  kinds  of  tests,  since  each  testee  will 
receive  a  set  of  items  selected  to  match  his/her  ability  levels;  and/or  the  ap¬ 
plicability  of  IRT-based  scoring  procedures  will  need  to  be  Investigated. 

Finally,  comparisons  of  computerized  adaptive,  computerized  nonadaptive  and 
traditional  measures  of  these  abilities  should  be  made  to  determine  if  more  pre¬ 
cise  and  efficient  measurement  can  be  achieved  through  adaptive  administration. 
Where  appropriate  external  criteria  are  available,  predictive  validity  compari¬ 
sons  should  also  be  made.  If  the  findings  from  traditional  ability  testing  gen¬ 
eralize  to  these  kinds  of  ability  tests,  the  resulting  tests  will  be  shorter, 
more  precise,  and  more  valid  and  will  permit  more  meaningful  measurement  of  the 
range  of  human  abilities. 


References 


Bejar,  I.  I.  An  Investigation  of  the  dichotomous,  graded,  and  continuous  re¬ 
sponse  level  latent  trait  models.  Unpublished  doctoral  dissertation,  Uni¬ 
versity  of  Minnesota,  1975. 

Bejar,  1.  I.  Assessing  the  unldlmenslonallty  of  achievement  tests.  Paper  pre¬ 
sented  at  the  annual  meeting  of  the  American  Educational  Research  Associa¬ 
tion,  San  Francisco,  April  1979. 

Bejar,  1.  I.  A  procedure  for  Investigating  the  unldlmenslonallty  of  achievement 
tests  based  on  Item  parameter  estimates.  Journal  of  Educational  Measure¬ 
ment,  1980,  17,  283-296. 

Bejar,  I.  I.,  Weiss,  D.  J.,  &  Kingsbury,  G.  G.  Calibration  of  an  Item  pool  for 
the  adaptive  measurement  of  achievement  (Research  Report  77-5).  Minneapo¬ 
lis:  University  of  Minnesota,  Department  of  Psychology,  Psychometric  Meth¬ 
ods  Program,  September  1977. 

Carroll,  J.  Pyschometrlc  tests  as  cognitive  tasks:  A  new  "structure  of  intel¬ 
lect."  In  L.  Resnlck  (Ed.),  The  nature  of  intelligence.  Hillsdale,  NJ: 
Laurence  Erlbaum/Halsted  Press,  1976. 

Chlang,  A.,  &  Atkinson,  R.  C.  Individual  differences  and  interrelationships 
among  a  select  set  of  cognitive  skills.  Memory  and  Cognition,  1976, 
661-672. 

Cory,  C.  H.  Relative  utility  of  computerized  versus  paper-and  pencil  tests  for 
predicting  job  performance.  Applied  Psychological  Measurement,  1977, 
551-564. 


Cory,  C.  H. ,  Rlmland,  B.,  &  Bryson,  R.  A.  Using  computerized  tests  to  measure 
new  dimensions  of  abilities:  An  exploratory  study.  Applied  Psychological 
Measurement ,  1977,  101-110. 

Day,  R.  S.  Systematic  Individual  differences  In  informtlon  processing.  In  P. 

G.  Zlmbardo  and  F.  L.  Ruch  (Ed.),  Psychology  and  Life.  Glenview,  IL:  Scott 
&  Foresman,  1977. 

Donlon,  T.  F.,  &  Fischer,  F.  E.  An  index  of  an  Individual's  agreement  with 

group-determined  item  difficulties.  Educational  and  Psychological  Measure¬ 
ment,  1968,  105-113. 

French,  J.  W.,  Ekstrom,  R.  B.,  &  Price,  L.  A.  Kit  of  reference  tests  for  cogni¬ 
tive  factors.  Princeton,  NJ:  Educational  Testing  Service,  1963. 

Hunt,  E.  B.,  Frost,  N. ,  &  Lunneborg,  C.  E.  Individual  differences  in  cognition. 
In  G.  Bower  (Ed.),  The  psychology  of  learning  and  motivation  (Vol.  7).  New 
York:  Academic  Press,  1973. 


A _ t 


-  22  - 


Hunt,  E.  B.,  Lunneborg,  C.,  &  Lewis,  J.  What  does  it  mean  to  be  high  verbal? 

Cognitive  Psychology,  1975,  7^,  194-227. 

Jacobs,  P.  I.  A  study  of  large  score  changes  on  the  SAT  (Research  Bulletin  RR 
63-20).  Princeton,  NJ:  Educational  Testing  Service,  June  1963. 

Johnson,  M.  F.,  &  Weiss,  D.  J.  Parallel  forms  reliability  and  measurement  accu¬ 
racy  comparison  of  adaptive  and  conventional  testing  strategies.  In  D.  J. 
Weiss  (Ed.),  Proceedings  of  the  1979  computerized  adaptive  testing  confer¬ 
ence.  Minneapolis:  University  of  Minnesota,  Department  of  Psychology,  Psy¬ 
chometric  Methods  Program,  Computerized  Adaptive  Testing  Laboratory,  1980. 

Kingsbury,  G.  G. ,  &  Weiss,  D.  J.  An  alternate-forms  reliability  and  concurrent 
validity  comparison  of  Bayesian  adaptive  and  conventional  ability  tests 
(Research  Report  80-5).  Minneapolis:  University  of  Minnesota,  Department 
of  Psychology,  Psychometric  Methods  Program,  December  1980. 

Levine,  M.,  &  Drasgow,  F.  Appropriateness  measurement:  Basic  principles  and 

validating  studies.  In  D.  J.  Weiss  (Ed.),  Proceedings  of  the  1979  comput¬ 
erized  adaptive  tt sting  conference.  Minneapolis:  University  of  Minnesota, 
Department  of  Psychology,  Psychometric  Methods  Program,  Computerized  Adap¬ 
tive  Testing  Laboratory,  1980. 

Levine,  M.  V.,  &  Rubin,  D.  B.  Measuring  the  appropriateness  of  multiple-choice 
test  scores.  Journal  of  Educational  Statistics,  1979,  269-290. 

Lunneborg,  C.  E.  Information-processing  correlates  of  the  Raven  Progressive 

Matrices  Tests  (Technical  Report  77-10).  Seattle,  WA:  University  of  Wash¬ 
ington,  Educational  Assessment  Center,  1977. 

McBride,  J.  R.  Adaptive  verbal  ability  testing  in  a  military  setting.  In  D.  J. 
Weiss  (Ed.),  Proceedings  of  the  1979  Computerized  Adaptive  Testing  Confer¬ 
ence.  Minneapolis:  University  of  Minnesota,  Department  of  Psychology,  Psy¬ 
chometric  Methods  Program,  Computerized  Adaptive  Testing  Laboratory,  1980. 

McBride,  J.  R. ,  &  Weiss,  D.  J.  Some  properties  of  a  Bayesian  adaptive  ability 
testing  strategy  (Research  Report  76-1).  Minneapolis:  University  of  Minne¬ 
sota,  Department  of  Psychology,  Psychometric  Methods  Program,  March  1976. 

Reckase,  M.  D.  The  effect  of  item  pool  characteristics  on  the  operation  of  a 
tailored  testing  procedure.  Paper  presented  at  the  spring  meeting  of  the 
Psychometric  Society,  Murray  Hill,  MJ,  April  1976. 

Reckase,  M.  D. ,  Ability  estimation  and  item  calibration  using  the  one  and  three 
parameter  logistic  models:  A  comparative  study  (Research  Report  77-1). 
Columbia:  University  of  Missouri,  Educational  Psychology  Department,  Tai¬ 
lored  Testing  Research  Laboratory,  November  1977. 

Eleckase,  M.  D.  Unlfactor  latent  trait  models  applied  to  multifactor  tests:  Re¬ 
sults  and  Implications.  In  D.  J.  Weiss  (Ed.),  Proceedings  of  the  1977  com¬ 
puterized  adaptive  testing  conference.  Minneapolis:  University  of  Minneso¬ 
ta,  Department  of  Psychology,  Psychometric  Methods  Program,  1978. 


1 


-I 


-  23  - 


Rose,  A.  Human  Information  processing;  An  assesment  and  research  battery  (Tech¬ 
nical  Report  46).  Ann  Arbor,  MI:  University  of  Michigan,  Human  Performance 
Center,  1974. 

Sperling,  G.  The  Information  available  In  brief  visual  presentations.  Psycho¬ 
logical  Monographs,  1960,  74 ,  (Whole  No.  498). 

Sternberg,  S.  High-speed  scanning  In  human  memory.  Science,  1966,  153, 

652-654. 

Trabln,  T.  E. ,  &  Weiss,  D.  J.  The  person  response  curve:  Fit  of  Individuals  to 
Item  characteristic  curve  models  (Research  Report  79-7).  Minneapolis:  Uni¬ 
versity  of  Minnesota,  Department  of  Psychology,  Psychometric  Methods  Pro¬ 
gram,  December  1979. 

Urry,  V.  W.  A  monte  carlo  investigation  of  logistic  mental  test  models  (Doctor¬ 
al  dissertation,  Purdue  University,  1970).  Dissertation  Abstracts  Interna¬ 
tional,  1971,  31 ,  6319B.  (University  Microfilms  No.  71-9475) 

Urry,  V.  W.  Individualized  testing  by  Bayesian  estimation  (Research  Bulletin 
0171-177).  Seattle:  University  of  Washington,  Bureau  of  Testing,  April 
1971. 

Vale,  C.  D.,  &  Weiss,  D.  J.  A  simulation  study  of  stradaptlve  ability  testing 
(Research  Report  75-6).  Minneapolis:  University  of  Minnesota,  Department 
of  Psychology,  Psychometric  Methods  Program,  December  1975.  (NTIS  No.  AD 
A020961) 

Vale,  C.  D.,  &  Weiss,  D.  J.  A  comparison  of  Information  functions  of  multiple- 
choice  and  free-response  vocabulary  Items  (Research  Report  77-2).  Mlnn.i*^ 
oils:  University  of  Minnesota,  Department  of  Psychology,  Psy •  tr Ic  Mech- 
ods  Program,  April  1977. 

Wood,  R.  L.,  &  Lord,  F.  M.  A  user’s  guide  to  LOGIST.  (Research  Memorandum 
RM-76-4).  Princeton,  NJ:  Educational  Testing  Service,  May  1976. 

Wood,  R.  L.,  Wlngersky,  M.  S.,  &  Lord,  F.  M.  LOGIST:  A  computer  program  for 
estimating  examinee  ability  and  Item  characteristic  curve  parameters  (ETS 
RM-76-6).  Princeton,  NJ:  Educational  Testing  Service,  June  1976. 

Wright,  B.  Solving  measurement  problems  with  the  Rasch  Model.  Journal  of  Edu¬ 
cational  Measurement,  1977,  14,  97-115. 


Distribution  List 


Navy  1 


1  Dr .  Ed  A1  ken 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152  1 

1  Dr.  Robert  Breaur 
Code  N-711 

NAVTRAEOUIPCEN  1 

Orlando.  PL  32813 

1  Dr.  Richard  Elster 

Department  of  Administrative  Sciences  6 
Naval  Postgraduate  School 
Monterey,  CA  93990 

1  DR.  PAT  FEDERICO 

NAVY  PERSONNEL  RAD  CENTER  1 

SAN  DIEGO,  CA  92152 

1  Dr .  John  Ford 

Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1 

1  Dr.  Henry  M.  Halff 

Department  of  Psychology, C-009 
University  of  California  at  San  Diego 
La  Jolla,  CA  92093 

1 

1  LT  Steven  D.  Harris,  MSC,  USN 
Code  5021 

Naval  Air  Development  Center 
Warminster,  Pennsylvania  18979 

5 

1  CDR  Charles  H.  Hutchins 

Naval  Air  Systems  Command  Hq 

AIR-390F 

Navy  Department 

Washington,  DC  20361  1 

1  CDR  Robert  S.  Kennedy 

Head,  Human  Performance  Sciences 
Naval  Aerospace  Medical  Research  Lab 
Box  29907  1 

New  Orleans,  LA  70189 

1  Dr.  Norman  J.  Kerr 

Chief  of  Naval  Technical  Training 
Naval  Air  Station  Memphis  (75)  1 

Millington,  TN  38059 

1  Dr.  William  L.  Haloy 

Principal  Civilian  Advisor  for 

Education  and  Training  ^ 

Naval  Training  Command,  Code  OOA 
Pensacola,  FL  32508 

1  Dr.  Kneale  Marshall 

Scientific  Advisor  to  DCNO(HPT)  ’ 

OP01T 

Washington  DC  20370 

1  CAPT  Richard  L.  Martin,  USN 

Prospective  Commanding  Officer  ^ 

USS  Carl  Vinson  (CVN-70) 

Newport  News  Shipbuilding  and  Drydock 
Newport  News,  VA  23607 

1  [ir  William  Montague 

Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 


Ted  M.  I.  Yellen 

Technical  Information  Office,  Code  201 
NAVY  PERSONNEL  RAD  CENTER 
SAN  DIEGO.  CA  92152 

Library,  Code  P201L 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Technical  Director 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

Commanding  Officer 
Naval  Research  Laboratory 
Code  2627 

Washington,  DC  20390 

Psychologist 
ONR  Branch  Office 
Bldg  119,  Section  D 
666  Summer  Street 
Boston,  MA  02210 

Psychologist 
ONR  Branch  Office 
536  S.  Clark  Street 
Chicago,  IL  60605 

Office  of  Naval  Research 
Code  937 

800  N.  Quincy  SStreet 
Arlington,  VA  22217 

Personnel  A  Training  Research  Programs 
(Code  958) 

Office  of  Naval  Research 
Arlington.  VA  22217 

Psychologist 
ONR  Branch  Office 
1030  East  Green  Street 
Pasadena,  CA  91101 

Special  Asst,  for  Education  and 
Training  (OP-01E> 

Bn.  2705  Arlington  Annex 
Washington,  DC  20370 

Office  of  the  Chief  of  Naval  Operations 
Research  Development  A  Studies  Branch 
(OP-115) 

Washington,  DC  20350 
Dr .  Donald  F.  Parker 

Graduate  School  of  Business  Administratl 
University  of  Michigan 
Ann  Arbor,  MI  98109 

LT  Frank  C.  Petho,  MSC,  USN  (Ph.D) 

Code  L51 

Naval  Aerospace  Medical  Research  Laborat 
Pensacola,  FL  32508 

Dr.  Gary  Poock 

Operations  Research  Department 
Code  55PK 

Naval  Postgraduate  School 
Monterey,  CA  93990 

Roger  W.  Remington,  Ph.D 

Code  L52 

NAMRL 

Pensacola,  FL  32508 


1  Dr.  Bernard  Rimland  (03B) 

Navy  Personnel  RAD  Center 
San  Diego.  CA  92152 

1  Dr.  Worth  Scanland 

Chief  of  Naval  Education  and  Training 
Code  N-5 

NAS,  Pensacola.  FL  32508 

1  Dr.  Robert  G.  Smith 

Office  of  Chief  of  Naval  Operations 
OP-987 H 

Washington,  DC  20350 

1  Dr.  Alfred  F.  anode 

Training  Analysis  A  Evaluation  Group 
(TAEG) 

Dept,  of  the  Navy 
Orlando,  FL  32813 

1  Dr.  Richard  Sorensen 

Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1  Roger  Weissinger-Baylon 

Department  of  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey.  CA  93990 

1  Dr.  Robert  Wisher 
Code  309 

Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 

1  Mr  John  H.  Wolfe 
Code  P310 

U.  S.  Navy  Personnel  Research  and 
Development  Center 
San  Diego.  CA  92152 

Army 


1  Technical  Director 

U.  S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
5001  Elsenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Dexter  Fletcher 

U.S.  Army  Research  Institute 
5001  Elsenhower  Avenue 
Alexandria, VA  22333 

1  DR.  FRANK  J.  HARRIS 

U.S.  ARMY  RESEARCH  INSTITUTE 
5001  EISENHOWER  AVENUE 
ALEXANDRIA,  VA  22333 

1  Col  FYank  Hart 

Army  Research  Institute  for  the 
Behavioral  A  Social  Sciences 
5001  Eisenhower  Blvd, 

Alexandria,  VA  22333 

1  Dr.  Michael  Kaplan 

U.S.  ARMY  RESEARCH  INSTITUTE 
5001  EISENHOWER  AVENUE 
ALEXANDRIA,  VA  22333 

1  Dr.  Hilton  S.  Katz 

Training  Technical  Area 
U.S.  Army  Research  Institute 
5001  Elsenhower  Avenue 
Alexandria.  VA  22333 


1  Dr.  Harold  F.  O’Mell,  Jr. 

Attn:  PERI-OK 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22353 

1  Dr.  Robert  Sasmor 

U.  S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Frederick  Steinhelser 
U.  S.  Array  Reserch  Institute 
5001  Elsenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Joseph  Ward 

U.S.  Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 


1  Dr.  Earl  A.  Alluisi 
HQ,  AFHRL  (AFSC) 

Brooks  AFB,  TX  78235 

1  Dr.  Genevieve  Haddad 
Program  Manager 
Life  Sciences  Directorate 
AFOSR 

Bolling  AFB,  DC  20332 

1  Dr.  Marty  Rockway 
Technical  Director 
AFHRL(OT) 

Williams  AFB,  AZ  5822A 

2  3700  TCHIW/nCH  Stop  32 
Sheppard  AFB,  TX  76311 


1  H,  William  Greenup 

Education  Advisor  (E03t) 
Education  Center,  MCDEC 
Ouantlco,  VA  221 3» 

1  Special  Assistant  for  Marine 
Corps  Matters 
Code  100M 

Office  of  Naval  Research 
800  N.  Quincy  St. 

Arlington,  VA  22217 

1  DR,  A.L.  SUFKOSKY 

SCIENTIFIC  ADVI.S0R  (CODE  RD-1) 
HQ,  U.S.  MARINE  CORPS 
WASHINGTON.  DC  20380 

Other  DoD 


12  Defense  Technical  Information  Center 
Cameron  Station,  Bldg  5 
Alexandria,  VA  223111 
Attn;  TC 

1  Dr,  Craig  I.  Fields 

Advanced  Research  Projects  Agency 
1400  Wilson  Blvd. 

Arlington,  VA  22209 

1  Military  Assistant  for  Training  and 
Personnel  Technology 

Office  of  the  Under  Secretary  of  Defense 
for  Research  A  Engineering 
Room  3D129,  The  Pentagon 
Washington,  DC  20301 


DARPA 

1400  Wilson  Blvd. 
Arlington.  VA  22209 

Civil  Govt 


1  Dr,  Susan  (Siipman 

Learning  and  Development 
National  Institute  of  Education 
1200  19th  Street  NW 
Washington,  DC  20208 

1  Dr.  Joseph  I.  Lipson 
SEDR  W-638 

National  Science  Foundation 
Washington.  DC  20550 

1  William  J.  McLaurln 

ft».  301,  Internal  Revenue  Service 
2221  Jefferson  Davis  Highway 
Arlington,  VA  22202 

1  Dr.  Arthur  Helmed 

National  Intltute  of  Education 
1200  19th  Street  NW 
Washington,  DC  20208 

1  Dr.  Andrew  R.  Molnar 
Science  Education  Dev. 
and  Research 

National  Science  Foundation 
Washington,  DC  20550 

1  Dr.  Frank  Withrow 

U.  S.  Office  of  Education 
400  Maryland  Ave.  SW 
Washington,  DC  20202 

1  Dr.  Joseph  L.  Voung.  Director 
Memory  A  Cognitive  Processes 
National  Science  Foundation 
Washington,  DC  20550 

Non  (k>vt 


Dr.  John  R.  Anderson 
Department  of  Psychology 
(^rnegle  Mellon  University 
Pittsburgh,  PA  15213 

Anderson,  Thomas  H.,  Ph.D. 
Center  for  the  Study  of  Reading 
174  Children's  Research  Center 
51  Gerty  Drive 
Champlagn,  IL  61820 

Dr.  John  Annett 
Department  of  Psychology 
University  of  Warwick 
Coventry  CV4  7AL 
ENGLAND 

DR.  MICHAEL  AIWOOD 
SCIENCE  APPLICATIONS  INSTITUTE 
40  DENVER  TECH.  CENTER  WEST 
7935  E.  PRENTICE  AVENUE 
ENGLEWOOD,  CO  80110 

I  psychological  research  unit 
Dept,  of  Defense  (Army  Office) 
Campbell  Park  Offices 
Canberra  ACT  2600,  Australia 

Dr.  Alan  Baddeley 
Medical  Research  Council 

Applied  Psychology  Unit 
15  Chaucer  Road 
Cambridge  CB2  2EF 
ENGLAND 


1  Dr.  Patricia  Baggett 

Department  of  Psychology 
University  of  Denver 
University  Park 
Denver,  CO  80208 

1  Nr  Avron  Barr 

Department  of  Computer  Science 
Stanford  University 
Stanford.  CA  94 305 

1  Dr.  Nicholas  A.  Bond 
Dept,  of  Psychology 
Sacramento  State  College 
600  Jay  Street 
Sacramento,  CA  95819 

1  Dr.  Lyle  Bourne 

Department  of  Psychology 
University  of  Colorado 

Boulder.  CO  8O309 

1  Dr.  John  S.  Broun 

XEROX  Palo  Alto  Research  Center 
3333  Coyote  Road 
Palo  Alto,  CA  94304 

1  Dr.  Bruce  Buchanan 

Department  of  Computer  .Science 
Stanford  University 
Stanford,  CA  94305 

1  DR.  C.  VICTOR  BUNDERSON 
WICAT  INC. 

UNIVERSITY  PLAZA,  SUITE  10 
1160  SO.  STATE  ST. 

OREM,  UT  84057 

1  Dr.  Pat  Carpenter 

Department  of  psychology 
Carnegie-Mellon  University 
Pittsburgh.  PA  75213 

1  Dr.  John  B,  Carroll 
Psychometric  Lab 
Unlv.  of  No.  Carolina 
Davie  Hall  013A 
Chapel  Hill,  NC  27514 

1  Charles  Myers  Library 
Livingstone  House 
Livingstone  Road 
Stratford 
London  E15  2LJ 
ENCUND 

1  Dr.  William  Chase 

Department  of  Psychology 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 

1  Dr,  Michel Ine  Chi 

Learning  RAD  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15213 

1  Dr.  William  Clancey 

Department  of  Computer  Science 
Stanford  University 
Stanford,  CA  94305 

1  Dr,  Allan  M,  Ckillins 

Bolt  Beranek  A  Newman,  Inc. 

50  Moulton  Street 
Cambridge,  Ha  02138 

1  Dr.  Lynn  A.  Cooper 
LRDC 

University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15213 


V 


1  Dr.  Meredith  P.  Crawford 

tnerlcan  Psychological  Association 
laOO  17th  Street,  N.M. 

Washington,  DC  20036 

1  Dr .  Kenneth  B.  Cross 
Anacapa  Sciences,  Inc. 

P.O.  Drawer  Q 

Santa  Barbara,  CA  93102 

1  Dr.  Ronna  Dillon 

Department  of  Guidance  and  Educational  P 
Southern  Illinois  University 
Carbondale,  IL  62901 

1  Dr .  Hubert  Dreyfus 

Department  of  Philosophy 
University  of  California 
Berkely,  CA  94720 

1  LCOL  J.  C.  Eggenberger 

DIRECTORATE  OF  PERSONNEL  APPLIED  RESEARC 
NATIONAL  DEFENCE  HQ 
101  COLONEL  BY  DRIVE 
OTTAWA.  CANADA  K1A  0K2 

1  Dr.  Ed  Feigenbaum 

Department  of  Computer  Science 
Stanford  University 
Stanford.  CA  94305 

1  Dr.  Richard  L.  Ferguson 

The  American  College  Testing  Program 

P.O.  Box  168 

Iowa  City,  lA  52240 

1  Mr.  Wallace  Feurzelg 

Bolt  Beranek  A  Nevman,  Inc. 

50  Moulton  St. 

Cambridge.  MA  02138 

1  Dr .  Victor  Fields 
Dept,  of  Psychology 
Montgomery  College 
Rockville,  MD  20850 

1  Dr.  John  R.  Frederlksen 
Bolt  Beranek  &  Newian 
50  Moulton  Street 
Cambridge,  MA  02138 

1  Dr.  Allnda  Friedman 

Department  of  Psychology 
University  of  Alberta 
Edmonton.  Alberta 
CANADA  T6G  2E9 

1  Dr.  R.  Edward  Geiselman 
Department  of  Psychology 
University  of  California 
Los  Angeles,  CA  90024 

1  DR.  ROBERT  GLASER 
LRDC 

UNIVERSITY  OF  PITTSBURGH 
3939  O'HARA  .STREET 
PITTSBURGH,  PA  15213 

1  Or,  Marvin  0.  Clock 
217  Stone  Hall 
Cornell  University 
Ithaca,  NY  14853 

1  Dr.  Daniel  Gopher 

Industrial  A  Management  Engineering 
Technlon-Israel  Institute  of  Technology 
Haifa 
ISRAEL 

1  DR.  JAMES  G.  GREENO 
LRDC 

UNIVERSITY  OF  PITTSBURGH 
3939  O'HARA  STREET 
PITTSBURGH,  PA  15213 


1  Dr,  Harold  Hawkins 

Department  of  Psychology 
University  of  Oregon 
Eugene  OR  97403 

1  Dr .  Barbara  Hayes-Roth 
The  Rand  Corporation 
1700  Main  Street 
Santa  Monica,  CA  90406 

1  Dr.  Frederick  Hayes-Roth 
The  Rand  Corporation 
1700  Main  Street 
Santa  Monica,  CA  90406 

1  Or.  James  R.  Hofflnan 

Department  of  Psychology 
University  of  Delaware 
Newark.  DE  19711 

1  Glenda  Greenwald,  Ed, 

"Human  Intelligence  Newsletter" 

P.  0.  Box  1163 
Birmingham,  HI  48012 

1  Or.  Earl  Hunt 

Dept,  of  Psychology 
University  of  Washington 
Seattle.  HA  98105 

I  Dr.  Steven  H.  Keele 
Dept,  of  Psychology 
University  of  Oregon 
Eugene ,  OR  97403 

1  Dr.  Halter  Kintsch 

Department  of  Psychology 
University  of  Colorado 
Boulder,  CO  80302 

1  Dr.  Kenneth  A.  Kllvlngton 
Program  Officer 
Alfred  P.  Sloan  Foundation 
630  Fifth  Avenue 
New  York,  NY  10111 

1  Dr,  Stephen  Kbsslyn 
Harvard  University 
Department  of  Psychology 
33  Kirkland  Street 
Cambridge,  HA  02138 

1  Mr.  Marlin  Kroger 
1117  Via  Goleta 

Palos  Verdes  Estates,  CA  90274 

1  Dr.  Jill  Urkln 

Department  of  Psychology 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 

1  Dr.  Alan  Lesgold 
Learning  RAD  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

1  Dr.  Michael  Levine 

Department  of  Educational  Psychology 
210  Education  Bldg. 

University  of  Illinois 
Champaign,  IL  61801 

1  Dr .  Robert  A.  Levit 

Director,  Behavioral  Sciences 
The  BUM  Corporation 
7915  Jones  Branch  Drive 
HeClean.  VA  22101 

1  Or.  Charles  Lewis 

Facultelt  Soclale  Hetenschappen 
Rljksuniversiteit  Groningen 
Oude  Boteringestraat 
Groningen 
NETHERLANDS 


1  Dr.  Erik  McWlllians 

Science  Education  Dev,  and  Research 
National  Science  Foundation 
Washington,  DC  20550 

1  Dr.  Mark  Miller 

Computer  Science  Laboratory 
Texas  Instruments.  Inc. 

Mail  Station  371,  P.O.  Box  225936 
Dallas.  TX  75265 

1  Dr.  Allen  Munro 

Behavioral  Technology  Laboratories 
1845  Elena  Ave.,  Fourth  Floor 
Redondo  Beach,  CA  90277 

1  Dr .  Donald  A  Norman 

Dept,  of  Psychology  C-009 
Unlv.  of  California.  San  Diego 
La  Jolla,  CA  92093 

1  Dr.  Jesse  Orlansky 

Institute  for  Defense  Analyses 
400  Army  Navy  Drive 
Arlington,  VA  22202 

1  Dr.  Seymour  A.  Papert 

Massachusetts  Institute  of  Technology 
Artificial  Intelligence  Lab 
545  Technology  Square 
Cambridge,  HA  02139 

1  Dr.  James  A.  Paulson 

Portland  State  University 
P.O.  Box  751 
Portland,  OR  97207 

1  HR.  LUIGI  PETRULLO 

2431  N.  EDGEHOOD  STREET 
ARLINGTON,  VA  22207 

1  Dr.  Martha  Poison 

Department  of  Psychology 

University  of  Colorado 

Boulder,  CO  80302 

1  DR.  peter  POLSON 
DEPT.  OF  PSYCHOLOGY 
UNIVERSITY  OF  COLORADO 
BOULDER,  CO  80309 

1  Dr.  Steven  E.  Poltrock 

Department  of  Psychology 

University  of  Denver 

Denver ,C0  80208 

1  HINRAT  M.  L.  RAUCH 
P  II  4 

BUNDESHINISTERIUM  DER  VERTEIDIGUNG 

POSTFACH  1328 

D-53  BONN  1,  GERMANY 

1  Dr.  Fred  Relf 

SESAME 

cfo  Physics  Department 
University  of  California 
Berkely,  CA  94720 

1  Dr.  Lauren  Resnlck 
LRDC 

University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15213 

1  Dr ,  Andrew  M.  Rose 

American  Institutes  for  Research 
1055  Thomas  Jefferson  St.  NH 
Washington,  DC  20007 

1  Dr,  Ernst  Z.  Rothkopf 
Bell  Laboratories 
600  Mountain  Avenue 
Murray  Hill,  NJ  07974 


.  —T 


1  Dr,  David  Ruaelhart 

Center  for  Human  Information  Processli 
Unlv.  of  California,  San  Diego 
La  Jolla,  eg  92093 

1  DR.  WALTER  SCHNEIDER 
DEPT.  OF  PSYCHOLOGY 
UNIVERSITY  OF  ILLINOIS 
CHAMPAIGN,  IL  61820 

1  Dr.  Alan  Schoenfeld 

Department  of  Hathematica 
Hamilton  College 
Clinton,  NY  13323 

1  DR.  ROBERT  J.  SEIDEL 

INSTRUCTIONAL  TECHNOLOGY  GROUP 
HUHRRO 

300  M.  WASHINGTON  ST. 

ALEXANDRIA,  VA  2231  << 

1  Cofflffllttee  on  Cognitive  Research 
%  Dr.  Lonnie  R.  Sherrod 
Social  Science  Research  Council 
605  Third  Avenue 
New  York,  NY  10016 

1  Robert  S.  Slegler 
Associate  Professor 
Carnegle-Hellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

1  Dr.  Edward  E.  Snlth 

Bolt  Beranek  A  Neunan,  Inc. 

50  Moulton  Street 
Cambridge,  MA  02138 

1  Dr,  Robert  Smith 

Department  of  Computer  Science 
Rutgers  University 
New  Brunswick,  NJ  08903 

1  Dr  ,  Richard  Snow 
School  of  Education 
Stanford  University 
Stanford,  CA  9M305 

1  Dr,  Robert  Sternberg 
Dept,  of  Psychology 
Yale  University 
Box  11A,  Yale  Station 
New  Haven,  CT  06520 

1  DR.  ALBERT  STEVENS 

BOLT  BERANEK  A  NEWMAN,  INC, 

50  MOULTON  STREET 
CAMBRIDGE,  MA  02138 

1  David  E.  Stone,  Ph.O. 

Hazeltlne  Corporation 
7680  Old  Sprlnghouse  Road 
McLean,  VA  22102 

1  DR.  PATRICK  SUPPES 

INSTITUTE  FOR  MATHEMATICAL  STUDIES  IN 
THE  SOCIAL  SCIENCES 
STANFORD  UNIVERSITY 
STANFORD,  CA  99305 

1  Dr.  Klkuml  Tatsuoka 

Computer  Based  Education  Research 
Laboratory 

252  Engineering  Research  Laboratory 
University  of  Illinois 
Urbans,  IL  61801 


1  Dr.  John  Thomas 

IBM  Thomas  J.  Watson  Research  Center 
P.O.  Box  218 

Yorktown  Heights,  NY  10598 

1  DR.  PERRY  THORNDYKE 
THE  RAND  CORPORATION 
1700  MAIN  STREET 
SANTA  MONICA.  CA  90406 

1  Dr .  Douglas  Towne 

Unlv.  of  So.  California 
Behavioral  Technology  Labs 
1845  S.  Elena  Ave. 

Redondo  Beach.  CA  90277 

1  Dr.  J.  Uhlaner 

Perceptronlcs ,  Inc. 

6271  Varlel  Avenue 
Woodland  Hills,  CA  91364 

1  Dr.  Benton  J.  Underwood 
Dept,  of  Psychology 
Northwestern  University 
Evanston,  IL  60201 

1  Dr.  Phyllis  Heaver 

Graduate  School  of  Education 
Harvard  University 
200  Larsen  Hall,  Appian  Hay 
Cambridge,  HA  02138 

1  Dr.  David  J.  Weiss 
N660  Elliott  Hall 
University  of  Minnesota 
75  E.  River  Road 
Minneapolis.  MN  55455 

1  DR.  GERSHON  WELTMAN 
PERCEPTRONICS  INC. 

6271  VARIEL  AVE. 

WOODLAND  HILLS,  CA  91367 

1  Dr.  Keith  T.  Hesoourt 

Information  Sciences  Dept. 

The  Rand  Corporation 
1700  Main  St. 


Previous  Publications  (continued) 


77-3. 

77-2. 

77-1. 

76-5. 

76-4. 

76-3. 

76-2. 

76-1. 

75-6. 

75-5. 

75-4. 

75-3. 

75-2. 

75-1. 

74-5. 

74-4. 

74-3. 

74-2. 

74-1. 

73-4. 

73-3. 

73-2. 

73-1. 


Accuracy  of  Perceived  Teat-Itea  Difficulties.  May  1977. 

A  Comparison  of  Information  Functions  of  Multiple-Choice  and  Free- 
Sesponse  Vocabulary  Items.  April  1977. 

^plications  of  Computerized  Adaptive  Testing.  March  1977. 

Final  Report:  Computerized  Ability  Testing,  1972-1975.  April  1976. 
Effects  of  Item  Characteristics  on  Test  Fairness.  December  1976. 
Psychological  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive 
Ability  Testing.  June  1976. 

Effects  of  Immediate  Knowledge  of  Results  and  Adaptive  Testing  on  Ability 
Test  Performance.  June  1976. 

Effects  of  Time  Limits  on  Test-Taking  Behavior.  April  1976. 

Some  Properties  of  a  Bayesian  Adaptive  Ability  Testing  Strategy.  March 
1976. 

A  Sisnilation  Study  of  Stradaptive  Ability  Testing.  December  1975. 
Computerized  Adaptive  Trait  Measurement:  ProbleoM  and  Prospects. 

November  1975. 

A  Study  of  Computer-Administered  Stradaptive  Ability  Testing.  October 
1975. 

Oipirical  and  Simulation  Studies  of  Flexilevel  Ability  Testing.  July 
1975. 

TETREST:  A  FORTRAN  IV  Program  for  Calculating  Tetrachoric  Correlations. 
March  1975. 

An  Empirical  Comparison  of  Two-Stage  and  Pyramidal  Adaptive  Ability 
Testing.  February  1975. 

Strategies  of  Adaptive  Ability  Measurement.  December  1974. 

Simulation  Studies  of  Two-Stage  Ability  Testing.  October  1974. 

An  Empirical  Investigation  of  Computer-Administered  Pyramidal  Ability 
Testing.  July  1974. 

A  Word  Knowledge  Item  Pool  for  Adaptive  Ability  Measurement.  June  1974. 

A  Computer  Software  System  for  Adaptive  Ability  Measurement.  January 
1974. 

An  Empirical  Study  of  Computer-Administered  Two-Stage  Ability  Testing. 
October  1973. 

The  Stratified  Adaptive  Computerized  Ability  Test.  September  1973. 
Comparison  of  Four  Empirical  Item  Scoring  Procedures.  August  1973. 
Ability  Measurement:  Conventional  or  Adaptive?  February  1973. 

Copies  of  these  reports  are  available,  while  supplies  last,  from: 
Computerized  Adaptive  Testing  Laboratory 
N660  Elliott  Ball 
University  of  Minnesota 
75  East  River  Road 
Minneapolis  MN  55455  U.S.A. 


Previous  Publications 


Proceedings  of  the  1977  Coaputerlxed  Adaptive  Testing  Conference. 
July  1978. 


Besearch  Reports 

81-2.  Effects  of  lanedlate  Feedback  and  Pacing  of  Item  Presentation  on  Ability 
Test  Performance  and  Psychological  Reactions  to  Testing.  February 
1981. 

81-1.  Review  of  Test  Theory  and  Methods.  January  1981. 

80-5.  An  Alternate-Forms  Reliability  and  Concurrent  Validity  Comparison  of 
Bayesian  Adaptive  and  Conventional  Ability  Tests.  December  1980. 

80-4.  A  Comparison  of  Adaptive,  Sequential,  and  Conventional  Testing  Strategies 
for  Mastery  Decisions.  November  1980. 

80-3.  Criterion-Related  Validity  of  Adaptive  Testing  Strategies.  June  1980. 

80-2.  Interactive  Computer  Administration  of  a  Spatial  Reasoning  Test.  April 
1980. 

Final  Report:  Computerized  Adaptive  Performance  Evaluation.  February 
1980. 

80-1.  Effects  of  Immediate  Knowledge  of  Results  on  Achievement  Test  Performance 
and  Test  Dimensionality.  January  1980* 

79-7.  The  Peraon  Response  Curve:  Fit  of  Individuals  to  Item  Characteristic 
Curve  Models.  December  1979. 

79-6.  Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy  In  the 
Measurement  of  Classroom  AchleveMUt.  November  1979. 

79-S.  An  Adaptive  Testing  Strategy  for  Mastery  Decisions.  September  1979. 

79-4.  Effect  of  Polnt-ln-Tlme  In  Instruction  on  the  Measurement  of  Achievement. 
August  1979. 

79-3.  Relationships  among  Achievement  Level  Estimates  from  Three  Item 
Characteristic  Curve  Scoring  Methods.  April  1979. 

Final  Report:  Bias-Free  Computerized  Testing.  March  1979. 

79-2.  Effects  of  Coiaputerlzed  Adaptive  Testing  on  Black  and  White  Students. 
March  1979. 

79-1.  Computer  Programs  for  Scoring  Test  Data  with  Item  Characteristic  Curve 
Models.  February  1979. 

78-5.  An  Item  Bias  Investigation  of  a  Standardised  Aptitude  Test.  December 
1978. 

78-4.  A  CoQStruct  Validation  of  Adaptive  Achievement  Testing.  November  1978. 

78-3.  A  Comparison  of  Levels  and  Dimensions  of  Performance  In  Black  and  White 
Groups  on  Tests  of  Vocabulary,  Mathematics,  and  Spatial  Ability. 
October  1978. 

78-2.  The  Effects  of  Knowledge  of  Results  and  Test  Difficulty  on  Ability  Test 
Performance  and  Psychological  Reactions  to  Testing.  September  1978. 

78-1.  A  Comperison  of  the  Feirness  of  Adaptive  and  Conventional  Testing 
Strategies.  August  1978. 

77-7.  An  Information  Comparison  of  Conventional  and  Adaptive  Tests  in  the 
Measurement  of  Classroom  Achievement.  October  1977. 

77-6.  An  Adaptive  Testing  Strategy  for  Achievement  Test  Batteries.  October 
1977. 

77-3.  Calibration  of  an  Item  Pool  for  the  Adaptive  Measurement  of  Achievement. 
September  1977. 

77-4.  A  Rapid  Item-Search  Procedure  for  Bayesian  Adaptive  Testing.  Nay  1977. 


-continued  overleaf- 


